[Bug 61213] New: Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] New: Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

            Bug ID: 61213
           Summary: Replace SXSSFWorkbook copyStreamAndInjectWorksheet
                    with StAX equivalent
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Mac OS X 10.1
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SXSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

I have been looking at a replacement. I will attach the code shortly. I would
like to get it reviewed before merging it.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

--- Comment #1 from PJ Fanning <[hidden email]> ---
Created attachment 35073
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35073&action=edit
use Stax to parse the worksheet data

I can merge this if it is ok

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

PJ Fanning <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35073|0                           |1
        is obsolete|                            |

--- Comment #2 from PJ Fanning <[hidden email]> ---
Created attachment 35074
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35074&action=edit
reload the patch.tar.gz

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO
           Severity|normal                      |enhancement

--- Comment #3 from Dominik Stadler <[hidden email]> ---
Did take a quick look: We are currently just copying the XML Stream as text and
would now parse the XML again and write it out again via XML serialization, do
you have an idea of how much impact that has for very large files?

SXSSF is used specifically for handling huge files (customers seemsto have
documents with more than 4GB uncompressed size, also multiple millions of
rows), we need to check that doing this additional parsing/serializing is not
slower for such large files.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

--- Comment #4 from PJ Fanning <[hidden email]> ---
Thanks Dominik - I would expect some performance impact but I think it is more
robust for the code not to make assumptions about file encodings etc. I also
think the SAX code is easier to understand.
StAX parsers are very fast but it is worth evaluating the impact to see if it
is excessive.
Since SXSSFWorkbook is for writing large files, I think the best performance
test would be for me to write a test case that adds a large number of rows and
to compare the times for the existing code and my proposed change.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

--- Comment #5 from Dominik Stadler <[hidden email]> ---
You can take a look at the FAQ at http://poi.apache.org/faq.html#faq-N10165, it
points to a sample which we used for comparing raw performance of
HSSF/XSSF/SXSSF in the past.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |60707


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=60707
[Bug 60707] [PATCH] Reading very large excel files using StAX made easier.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

--- Comment #6 from PJ Fanning <[hidden email]> ---
I did some initial testing and the Stax based code is significantly slower. I
will spend a little more time to see if the performance can be improved.
https://github.com/pjfanning/poi-sxssf-stax - not very scientific but if I use
SXSSFWorkbook, the test takes 3 seconds but 25 seconds with the STAX
equivalent.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61213] Replace SXSSFWorkbook copyStreamAndInjectWorksheet with StAX equivalent

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61213

PJ Fanning <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #7 from PJ Fanning <[hidden email]> ---
This approach is much slower

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...