[Bug 60397] New: very slow Cell Merge for SXSSFWorkbook

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 60397] New: very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

            Bug ID: 60397
           Summary: very slow Cell Merge for SXSSFWorkbook
           Product: POI
           Version: 3.15-FINAL
          Hardware: Macintosh
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SXSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

When merging the cells for a SXSSFWorkbook the rendering speed is drastically
reduced. This can be demonstrated by adapting the sample code to merge each of
the cells.

        SXSSFWorkbook wb = new SXSSFWorkbook(100);
        Sheet sh = wb.createSheet();
        for(int rownum = 0; rownum < 10000; rownum = rownum + 2){
            Row row1 = sh.createRow(rownum);
            Row row2 = sh.createRow(rownum + 1);
            for(int cellnum = 0; cellnum < 10; cellnum++){
                Cell cell1 = row1.createCell(cellnum);
                String address = new CellReference(cell1).formatAsString();
                cell1.setCellValue(address);

                Cell cell2 = row2.createCell(cellnum);
                cell2.setCellValue("");

                sh.addMergedRegion(new CellRangeAddress(
                        rownum,
                        rownum + 1,
                        cellnum,
                        cellnum  
                ));
            }
        }

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Javen O'Neal <[hidden email]> ---
Because merged regions cannot overlap without producing a corrupt document, POI
may be checking the list of merged regions on a sheet for potential
intersections before adding a merged region. This gives O(N) behavior for
adding one region instead of the expected O(1).

For XSSF and SXSSF, we added addMergedRegionUnsafe [1] that skips these checks
for speed, but may produce a corrupt document when opened in Excel. Have you
tried addMergedRegionUnsafe?

[1]
https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFSheet.html#addMergedRegionUnsafe(org.apache.poi.ss.util.CellRangeAddress)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

Marc <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #2 from Marc <[hidden email]> ---
Hi Javen,

Thanks for the info - the unsafe method is definitely faster (and I'll update
all our code to use it, as there are tests in place to ensure the spreadsheets
are rendered correctly - so corruption shouldn't be an issue). However, it's
still a lot slower than not using any merged cells. I've run the following
benchmark tests to demonstrate:

2000 Rows

- No merging -> 1 second
- Unsafe merging -> 2 seconds
- Safe merging -> 33 seconds

10,000 Rows

- No merging -> 2 seconds
- Unsafe merging -> 39 seconds
- Safe merging - > hadn't completed after 5 mins, so gave up :-)

Cheers,

Marc

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #3 from Javen O'Neal <[hidden email]> ---
Here's the implementation [1]. Looks fairly benign, though there could be
performance problems in method calls.

Can you test getNumberOfCells and formatAsString to see if either of these are
the cause of the 39 second performance?
for (CellRangeAddress region : regionsToAdd) {
    region.getNumberOfCells();
    region.formatAsString();
}

[1]
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFSheet.java?revision=1768589&view=markup#l347

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

--- Comment #4 from Javen O'Neal <[hidden email]> ---
It would probably be faster for us to read the merged regions into a java List
and discard the CTMergedRegions, operate on the Java List, and then recreate
the CTMergedRegions XML nodes when writing the workbook. Not sure whether this
would improve the speed here, but I'm guessing worksheet.getMergeCells() and
ctMergeCells.addNewMergeCell() are expensive calls since it's creating and
linking XML nodes in a DOM. This work could be deferred to when the workbook is
written out.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

Stephen Webster <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #5 from Stephen Webster <[hidden email]> ---
I have been watching this bug, and wanted to add some information that I have
gathered since I didn't see any progress.  Here is my testcase.

for(int i=0; i < 20000; i++) {
   for(int j=0; j < 30; j+=3) {
      CellRangeAddress range = new CellRangeAddress(i,i,j,j+2);
      sheet.addMergedRegionUnsafe(range);
   }
}

The results get quite a bit worse as the number of rows/merged region number
increases.

1000  rows -> 1.2 seconds
10000 rows -> 100 seconds
20000 rows -> Over 5 minutes.

I dropped the above code and profiled it and turns out the time spent is in the
return statement of addMergedRegion(), ctMergeCells.sizeOfMergeCellArray();

I am not using the return value of this method, after creating a locally
modified version of the XSSFSheet class, I created a method without the return
statement and the testcase finishes in less then 1 second instead of over 5
minutes for 20,000 rows.

Here is the stack from profiler which was 90% of time spent
Stack Trace
org.apache.xmlbeans.impl.store.Locale.count(Xobj, QName, QNameSet)
org.apache.xmlbeans.impl.store.Xobj.count_elements(QName)      
org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTMergeCellsImpl.sizeOfMergeCellArray()
org.apache.poi.xssf.usermodel.XSSFSheet.addMergedRegion(CellRangeAddress,
boolean)
org.apache.poi.xssf.usermodel.XSSFSheet.addMergedRegionUnsafe(CellRangeAddress)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

--- Comment #6 from Greg Woolsey <[hidden email]> ---
That makes sense.  All XmlBeans methods are ridiculously expensive.  Javen's
suggestion of using temporary POJOs to avoid multiple bean operations seems
like a good one, as well as a new method with a void return for cases that
don't need it, like yours.

Many other places we've elected to track data outside of the bean structures
for performance reasons, especially in things like the *Evaluator classes.
This sounds like another candidate.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60397] very slow Cell Merge for SXSSFWorkbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60397

Nail Samatov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #7 from Nail Samatov <[hidden email]> ---
Created attachment 36284
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36284&action=edit
Add a method that doesn't calculate merge cell array

I reproduced the same issue as Stephen Webster and found that adding a new
method that doesn't calculate a merge cell array solves the issue for me. It
would help a lot if you add such method.

I attached a patch that should solve the issue, could you apply it if it's ok?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]