[Bug 61832] New: Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61832] New: Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

            Bug ID: 61832
           Summary: Unable to create a excel of 500,000 rows and 150
                    column using SXSSF workbook
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: SXSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35565
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35565&action=edit
"GenerateExcelFile.txt" contains the java code.

I am trying to generate a excel with SXSSF workbook. The size of excel may
shoot to 200-300MB, whose row count will be 500,000 (i.e 0.5 million) and
column count will be around 150 approximately..! I get content issue while
opening such large excel file.
The version which I am using is APACHE POI 3.9
I had also tried with, 3.15 the same issue exist.

System holds enough ram and disk space as per the requirement.

3.9 performance seems to better than other versions, Is there any way to rule
this issue out in that version.?

Anyone, who has idea on above scenario, kindly help me to proceed further.

FYI, I hadn't applied any styles or format for the cells, just inserted a data
of 25 character length. I hadn't violated/crossed any excel
specification/limits.

Please find the sample code as below.

The alert message which I receive on opening the excel file is,

"We found a problem with some content in 'filename'. Do you want us to try to
recover as much as we can? If you trust the source of this workbook"

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #1 from Thamodharan <[hidden email]> ---
Created attachment 35566
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35566&action=edit
Excel 2013 - Alert Message - While opening such large file

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #2 from Yegor Kozlov <[hidden email]> ---
(In reply to Thamodharan from comment #1)
> Created attachment 35566 [details]
> Excel 2013 - Alert Message - While opening such large file

Is it a 32 or 64-bit Excel ? I wonder if it matters.

What happens if you click Yes and agree to recover as much as possible ?

Also, try to disable automatic re-calculation of formulas in Options / Formulas
/ Calculation Options. By default, Excel recalculates all the formulas in a
worksheet on open. You'd better turn it off for such large files.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #3 from Thamodharan <[hidden email]> ---
(In reply to Yegor Kozlov from comment #2)

> (In reply to Thamodharan from comment #1)
> > Created attachment 35566 [details]
> > Excel 2013 - Alert Message - While opening such large file
>
> Is it a 32 or 64-bit Excel ? I wonder if it matters.
>
> What happens if you click Yes and agree to recover as much as possible ?
>
> Also, try to disable automatic re-calculation of formulas in Options /
> Formulas / Calculation Options. By default, Excel recalculates all the
> formulas in a worksheet on open. You'd better turn it off for such large
> files.

Hi Yegor Kozlov,

FYI,
The above attached code works good, if the number of rows has been reduced to
0.3 million rows (i.e 300,000) and 150 column. I can open the file without any
issue or error. But the same doesn't happen for 0.4 or 0.5 million rows.


Response to your questions.,

 1)  Its 32-bit Excel 2013.

 2)  When I click Yes to recover, excel stops responding few minutes and later
i am able to retrieve all data without any loss, but I get repair error report
in an alert box. Please find new attachments related to it.

     - Once, I save the repaired file, I didn't get the content issue pop up
when I open it again or later.

 3)  We just tried to insert string, no numeric or formulas are applied to the
workbook. Though, I tried your suggestion.

     - Before generating the workbook, I turned off the formula in local excel
file and saved it, checked once again by opening, it remains turned off. But
once the code is executed successfully, the formula is again enabled
automatically. Same Content issue persist for large files.

Kindly help if there is any further solution.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #4 from Thamodharan <[hidden email]> ---
Created attachment 35571
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35571&action=edit
ExcelErrorAfterRepair

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #5 from Yegor Kozlov <[hidden email]> ---
Can you please do one more test: generate a 500K rows x 150 columns spreadsheet
containing only numbers and see if it opens? Due to its streaming nature SXSSF
stores strings differently than Excel does: Excel puts strings in a shared
cache while SXXSF stores strings inline. Number cells should be fully
compatible . Lets see.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #6 from Thamodharan <[hidden email]> ---
(In reply to Yegor Kozlov from comment #5)
> Can you please do one more test: generate a 500K rows x 150 columns
> spreadsheet containing only numbers and see if it opens? Due to its
> streaming nature SXSSF stores strings differently than Excel does: Excel
> puts strings in a shared cache while SXXSF stores strings inline. Number
> cells should be fully compatible . Lets see.

Hi Yegor Kozlov,

I had tried as you suggested, it worked fine for 500K X 150 column while
inserting numbers.The maximum int limit which we can give is 2147483647.
cell.setCellValue(2147483647);
Thus it worked fine.

So, I extended the testing by increasing the row count and column count,
i.e., 900K X 250 Column. Same cellValue 2147483647.
The workbook is done successful. File size reached to 504 MB.

But,The same content issue occurs while opening.

Still as a mystery.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #7 from Greg Woolsey <[hidden email]> ---
I ran the attached code, and examined the generated workbook.  All XML
validates successfully with xmllint.  The worksheet file, however, expands to >
5GB due to all the inlineStr attributes for all the cells.

My suspicion is that Excel doesn't like that many inline cell string values.  I
verified that the change between the original and "repaired" versions was
exactly turning those inlineStr attributes into a single shared string value
referenced by all the cells.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #8 from Thamodharan <[hidden email]> ---
Hi Greg Woolsey,

SXSSF by default takes inline string, it can be changed to shared string for
which some patch is available. Since Shared string takes the heap space memory,
its difficult to opt for it, due to resource constraint. Is there any luck, to
get it done with SXSSF inline itself ?

Regards
Thamodharan B

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #9 from Yegor Kozlov <[hidden email]> ---
In theory SXSSF can use a fixed-size cache of shared strings, e.g. we can allow
SXSSF to put 10K distinct strings in the SST. If the cache is full then the
string goes inline. The 10K limit can be configurable.
This way SXSSF-generated workbooks with small number of distinct values will be
fully compatible with XSSF.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #10 from Thamodharan <[hidden email]> ---
Hi Yegor Kozlov,

Hope, you mean to set this limit 10K as mentioned below.

SXSSFWorkbook wb = new SXSSFWorkbook(10000);

Though I set this, I face content issue while opening it.
I tried extending it to 30K, the same persist.

If so my try is wrong from what you suggest. Kindly correct it.

I have limited heap space, but have enough disk space. Though I plan to extend
the memory, and store in SST, I cant run parallel workbook. I can do only one,
though i increase twice or thrice as of now.

My system RAM is 8GB, I had allotted 1 GB max to jboss.

Kindly suggest your view on it.

Regards
Thamodharan B

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61832] Unable to create a excel of 500,000 rows and 150 column using SXSSF workbook

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

--- Comment #11 from Yegor Kozlov <[hidden email]> ---
try to construct SXSSFWorkbook in a different way:

  wb = new SXSSFWorkbook(
    null, /* template workbook. pass null to start with a blank workbook */
    1000, /* size of the sliding window*/
    true, /* whether to compress temp files.*/
    true  /* whether to use shared string table*/
  );



not that it may increase the memory footprint, especially if the number of
distinct strings is large .

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]