[Bug 61665] New: XSSF is much slower than HSSF

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61665] New: XSSF is much slower than HSSF

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61665

            Bug ID: 61665
           Summary: XSSF is much slower than HSSF
           Product: POI
           Version: 3.17-FINAL
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: XSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

On making big amount of cell writes 10000-30000, XSSF speed is much slower than
HSSF, about x5 - x10, what in general not suposed to be so, or at least not
that much.

It can be reproduced by Apache POI test class:
https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/ss/examples/SSPerformanceTest.java.

According another message on stackoverflow:
https://stackoverflow.com/questions/34246083/apache-poi-performance
the problem could be not directly in poi, but in synchronized calls of xmlbeans
and poi-ooxml-schemas.

Please also take a look at this messages:

https://stackoverflow.com/questions/34995058/apache-poi-much-quicker-using-hssf-than-xssf-what-next

http://apache-poi.1045710.n5.nabble.com/Performance-Issue-with-XSSF-as-compared-to-HSSF-in-POI-3-7-td3307475.html

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61665] XSSF is much slower than HSSF

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61665

Travis Burtrum <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Travis Burtrum <[hidden email]> ---
So I ran into this too and settled on hacking HSSF to just calculate more
cells/rows, though it can't read or write these:

https://lists.apache.org/thread.html/0bc90a3ed386edddfcb9b93ce6c262ad145a6b0433d0fcfe70ef10a2@%3Cdev.poi.apache.org%3E

There has also been the recent change to disable synchronization in XmlBeans to
hopefully avoid this, but I tested it, and it changed nothing.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61665] XSSF is much slower than HSSF

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61665

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |enhancement

--- Comment #2 from Javen O'Neal <[hidden email]> ---
XML as a serialization and deserialization format will always be slower than an
optimized binary format. HApache POI's internal model for an xlsx file
maintains XML beans, updating them as needed, writing out the XML beans as is.
The benefit of this strategy is that features that POI doesn't understand or
implement are kept, unmodified. Had we converted the information in the XML
beans to pojos and discarded the XML beans immediately after reading the
workbook, it's likely information would have been lost.

We are investigating replacing XMLBeans with a different XML library
(constrained by ASL 2.0 license compatibility) that may be more performant and
memory efficient, and this may provide some improvements in speed. This is an
extremely large task that requires modifying nearly every XSSF class and OOXML
class. Any help would be greatly appreciated.

On a smaller scale, if after profiling the code you find a section that can be
improved, please submit your profiling results and a patch that doesn't break
backwards compatibility.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61665] XSSF is much slower than HSSF

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61665

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WONTFIX
             Status|NEW                         |RESOLVED

--- Comment #3 from Dominik Stadler <[hidden email]> ---
I did some analysis using Dynatrace AppMon and could not find any immediate
items that we can improve here. The top-consumers are all deep down somewhere
in XmlBeans, therefore I don't think we can do much here outside of the larger
XmlBeans replacement work.

getT()  6.37s   CPU: 36 %, Sync: 0 %, Wait: 0 %, Suspension: 0 %, I/O: 64 %    
org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl    
Openxmlformats

hasTextEnsureOccupancy()        6.37s   CPU: 36 %, Sync: 0 %, Wait: 0 %,
Suspension: 0 %, I/O: 64 %     org.apache.xmlbeans.impl.store.Xobj     XMLBeans

embedCurs()     6.37s   CPU: 36 %, Sync: 0 %, Wait: 0 %, Suspension: 0 %, I/O:
64 %     org.apache.xmlbeans.impl.store.Locale   XMLBeans


I have updated the FAQ entry slightly to adjust the expected timings via
r1819415

One thing to note is that initially class loading was taking a considerable
amount of time, therefore I added a way to do a "warmup" run to
SSPerformanceTest so that the actual code is tested, not classloading, see
r1819417

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]