[Bug 61862] New: Improve TreeMap implementation to cache first and last key

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61862] New: Improve TreeMap implementation to cache first and last key

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61862

            Bug ID: 61862
           Summary: Improve TreeMap implementation to cache first and last
                    key
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SS Common
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35589
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35589&action=edit
an example SortedMap subclass that caches the first key and last key, written
by Javen. No unit test.

POI has several TreeMaps to maintain the sorted order of cells in a row or rows
in a sheet.

Getting the last and first keys on those maps can be expensive (O(log N)) if
called frequently.

Let's investigate if any of these maps would benefit from a Map implementation
that cached the first and last key, making those keys available in O(1).

Most common example:
sheet.getLastRowNum()

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61862] Improve TreeMap implementation to cache first and last key

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61862

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|NEW                         |NEEDINFO

--- Comment #1 from Javen O'Neal <[hidden email]> ---
I have not tested this new caching TreeMap to see if it's faster or slower than
a regular TreeMap. TreeMap's O(log(N)) performance is probably pretty good.
The first or last key of a TreeMap containing 1,000,000 rows would traverse
through ~20 nodes (2^20=1,000,000).

The constant overhead of keeping the cache up to date for every modification
just to make firstKey/lastKey faster for getFirstRowNum() and getLastRowNum()
faster, which are probably called no more than once per sheet anyways... This
might not actually be an improvement for XSSFSheet.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61862] Improve TreeMap implementation to cache first and last key

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61862

--- Comment #2 from Greg Woolsey <[hidden email]> ---
How about caching the first and last keys and a "dirty" flag or clearing the
cache values on structural changes?  Then the first/last would only need to be
calculated on first access after changes.  I can see these being referenced
many times when evaluating formulas, for example, but we wouldn't want them to
be continuously updated as rows are being created.  Lazy init, eager
clearing/dirtying would perhaps still offer a formula performance gain without
adding much if any overhead when creating documents.

The only cases where it wouldn't add much value, and still wouldn't add much
overhead beyond what TreeMap already does, would be cases where data changes
and formula evaluations/data scanning are done repeatedly or in some
iterative/interleaved pattern.  In my experience these are not typical use
cases.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61862] Improve TreeMap implementation to cache first and last key

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61862

--- Comment #3 from Greg Woolsey <[hidden email]> ---
All said and done, however, the Map interface has so many possible mutation
vectors, including submaps and iterators, it likely isn't worth it.  Caching in
the evaluation structures is I think sufficient.  There we already have a
required manual method for clearing cached values when data changes.  As you
mentioned, most use cases where users iterate over things like rows, it's done
once per sheet, and the cost is negligible.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]