Apache POI: Content Issue while processing huge excel file with SXSSF

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache POI: Content Issue while processing huge excel file with SXSSF

Thamodharan Balraj
Hi Everyone,

I am trying to generate a excel with SXSSF workbook. The size of excel may
shoot to 200-300MB, whose row count will be 5,00,000 and column count will
be around 150 approximately..! I get content issue while opening such large
excel file.
The version which I am using is APACHE POI 3.9
I had also tried with, 3.15 the same issue exist.

System holds enough ram and disk space as per the requirement.

3.9 performance seems to better than other versions, Is there any way to
rule this issue out in that version.?

Anyone, who has idea on above scenario, kindly help me to proceed further.

Regards
Thamodharan B
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Andreas Reichel
Thamodharam,

we use Apache POI 3.18 development snapshot and have no problem to
create SXSSF workbooks of 300 MByte, with up to 1 Mill. Rows and around
40 columns. On the server, that took up to 20 GB of RAM.
However, we were never able to open such files with Libre Office,
Gnumeric or Excel (while plenty of resources were available on the
client computer). The spreadsheet programs just freeze.

The interesting part is: We exactly the same framework, we are good to
create SXSSF workbooks of 100 MByte (or less) which actually open in
Libre Office and Gnumeric and Excel quite fast.

We are not sure, if there is a limitation in the Spread Sheet programs
or Apache POI hits a problem.

Cheers


On Wed, 2017-11-22 at 21:05 +0530, Thamodharan Balraj wrote:

> Hi Everyone,
>
> I am trying to generate a excel with SXSSF workbook. The size of excel may
> shoot to 200-300MB, whose row count will be 5,00,000 and column count will
> be around 150 approximately..! I get content issue while opening such large
> excel file.
> The version which I am using is APACHE POI 3.9
> I had also tried with, 3.15 the same issue exist.
>
> System holds enough ram and disk space as per the requirement.
>
> 3.9 performance seems to better than other versions, Is there any way to
> rule this issue out in that version.?
>
> Anyone, who has idea on above scenario, kindly help me to proceed further.
>
> Regards
> Thamodharan B
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Greg Woolsey
In reply to this post by Thamodharan Balraj
The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you are
trying to put 5 million rows on a single sheet, Excel will never open it.

POI streaming may not catch it (it should, but maybe that's the bug), but
OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider it
corrupt.

I've written out workbooks with < 1M rows in the past, without using the
streaming format even.  I've just had to notice when I reached the max # on
a sheet and start a new sheet at that point.

See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007 enum
for version specific limits.

On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <[hidden email]>
wrote:

> Hi Everyone,
>
> I am trying to generate a excel with SXSSF workbook. The size of excel may
> shoot to 200-300MB, whose row count will be 5,00,000 and column count will
> be around 150 approximately..! I get content issue while opening such large
> excel file.
> The version which I am using is APACHE POI 3.9
> I had also tried with, 3.15 the same issue exist.
>
> System holds enough ram and disk space as per the requirement.
>
> 3.9 performance seems to better than other versions, Is there any way to
> rule this issue out in that version.?
>
> Anyone, who has idea on above scenario, kindly help me to proceed further.
>
> Regards
> Thamodharan B
>
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Andreas Reichel

On Wed, 2017-11-22 at 19:39 +0000, Greg Woolsey wrote:
> POI streaming may not catch it (it should, but maybe that's the bug),
> but
>
> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will
> consider it
>
> corrupt.

We have hit that limit and POI throws an exception (both in XSSF and
SXSSF).
The mysterious "Excel does not open" happens with files below the row
limit, when the number of cells exceed a certain number. Not sure
though what exactly that limit is.

Cheers

Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Greg Woolsey
Perhaps there are some other less obvious limits POI isn't checking.  There
is quite a list at

https://support.office.com/en-us/article/Excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

On Wed, Nov 22, 2017 at 11:51 AM Andreas Reichel <
[hidden email]> wrote:

>
> On Wed, 2017-11-22 at 19:39 +0000, Greg Woolsey wrote:
> > POI streaming may not catch it (it should, but maybe that's the bug),
> > but
> >
> > OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will
> > consider it
> >
> > corrupt.
>
> We have hit that limit and POI throws an exception (both in XSSF and
> SXSSF).
> The mysterious "Excel does not open" happens with files below the row
> limit, when the number of cells exceed a certain number. Not sure
> though what exactly that limit is.
>
> Cheers
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Thamodharan Balraj
In reply to this post by Greg Woolsey
Hi Greg Woolsey,

I am not trying to put 5 million record in a sheet, I just tried with 0.5 million rows I.e (500,000 rows )with 150 column.

As you also mentioned, I had refered the limits, I hadn't crossed/violated any specification limits.

Even I didn't apply any styles or format. 
I just iterate and write a comman data of length 25 character in all the cells.

The workbook is done successful. But while opening it, I face the content issue error. Please find the screenshot attached with the mail regarding to it. 

Kindly help, if any one has idea on it. 

Regards
Thamodharan B



On Nov 23, 2017 1:09 AM, "Greg Woolsey" <[hidden email]> wrote:
The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you are
trying to put 5 million rows on a single sheet, Excel will never open it.

POI streaming may not catch it (it should, but maybe that's the bug), but
OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider it
corrupt.

I've written out workbooks with < 1M rows in the past, without using the
streaming format even.  I've just had to notice when I reached the max # on
a sheet and start a new sheet at that point.

See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007 enum
for version specific limits.

On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <[hidden email]>
wrote:

> Hi Everyone,
>
> I am trying to generate a excel with SXSSF workbook. The size of excel may
> shoot to 200-300MB, whose row count will be 5,00,000 and column count will
> be around 150 approximately..! I get content issue while opening such large
> excel file.
> The version which I am using is APACHE POI 3.9
> I had also tried with, 3.15 the same issue exist.
>
> System holds enough ram and disk space as per the requirement.
>
> 3.9 performance seems to better than other versions, Is there any way to
> rule this issue out in that version.?
>
> Anyone, who has idea on above scenario, kindly help me to proceed further.
>
> Regards
> Thamodharan B
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Greg Woolsey
Can you open an issue at

https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI

with sample code and/or a sample workbook showing the problem?  That would
help us reproduce it and track anything that requires a code change.

On Tue, Nov 28, 2017 at 3:38 AM Thamodharan Balraj <[hidden email]>
wrote:

> Hi Greg Woolsey,
>
> I am not trying to put 5 million record in a sheet, I just tried with 0.5
> million rows I.e (500,000 rows )with 150 column.
>
> As you also mentioned, I had refered the limits, I hadn't crossed/violated
> any specification limits.
>
> Even I didn't apply any styles or format.
> I just iterate and write a comman data of length 25 character in all the
> cells.
>
> The workbook is done successful. But while opening it, I face the content
> issue error. Please find the screenshot attached with the mail regarding to
> it.
>
> Kindly help, if any one has idea on it.
>
> Regards
> Thamodharan B
>
>
>
> On Nov 23, 2017 1:09 AM, "Greg Woolsey" <[hidden email]> wrote:
>
> The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you are
> trying to put 5 million rows on a single sheet, Excel will never open it.
>
> POI streaming may not catch it (it should, but maybe that's the bug), but
> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider it
> corrupt.
>
> I've written out workbooks with < 1M rows in the past, without using the
> streaming format even.  I've just had to notice when I reached the max # on
> a sheet and start a new sheet at that point.
>
> See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007 enum
> for version specific limits.
>
> On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <
> [hidden email]>
> wrote:
>
> > Hi Everyone,
> >
> > I am trying to generate a excel with SXSSF workbook. The size of excel
> may
> > shoot to 200-300MB, whose row count will be 5,00,000 and column count
> will
> > be around 150 approximately..! I get content issue while opening such
> large
> > excel file.
> > The version which I am using is APACHE POI 3.9
> > I had also tried with, 3.15 the same issue exist.
> >
> > System holds enough ram and disk space as per the requirement.
> >
> > 3.9 performance seems to better than other versions, Is there any way to
> > rule this issue out in that version.?
> >
> > Anyone, who has idea on above scenario, kindly help me to proceed
> further.
> >
> > Regards
> > Thamodharan B
> >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Thamodharan Balraj
Sure, will do that.

On Nov 29, 2017 4:05 AM, "Greg Woolsey" <[hidden email]> wrote:

> Can you open an issue at
>
> https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI
>
> with sample code and/or a sample workbook showing the problem?  That would
> help us reproduce it and track anything that requires a code change.
>
> On Tue, Nov 28, 2017 at 3:38 AM Thamodharan Balraj <
> [hidden email]> wrote:
>
>> Hi Greg Woolsey,
>>
>> I am not trying to put 5 million record in a sheet, I just tried with 0.5
>> million rows I.e (500,000 rows )with 150 column.
>>
>> As you also mentioned, I had refered the limits, I hadn't
>> crossed/violated any specification limits.
>>
>> Even I didn't apply any styles or format.
>> I just iterate and write a comman data of length 25 character in all the
>> cells.
>>
>> The workbook is done successful. But while opening it, I face the content
>> issue error. Please find the screenshot attached with the mail regarding to
>> it.
>>
>> Kindly help, if any one has idea on it.
>>
>> Regards
>> Thamodharan B
>>
>>
>>
>> On Nov 23, 2017 1:09 AM, "Greg Woolsey" <[hidden email]> wrote:
>>
>> The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you
>> are
>> trying to put 5 million rows on a single sheet, Excel will never open it.
>>
>> POI streaming may not catch it (it should, but maybe that's the bug), but
>> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider
>> it
>> corrupt.
>>
>> I've written out workbooks with < 1M rows in the past, without using the
>> streaming format even.  I've just had to notice when I reached the max #
>> on
>> a sheet and start a new sheet at that point.
>>
>> See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007
>> enum
>> for version specific limits.
>>
>> On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <
>> [hidden email]>
>> wrote:
>>
>> > Hi Everyone,
>> >
>> > I am trying to generate a excel with SXSSF workbook. The size of excel
>> may
>> > shoot to 200-300MB, whose row count will be 5,00,000 and column count
>> will
>> > be around 150 approximately..! I get content issue while opening such
>> large
>> > excel file.
>> > The version which I am using is APACHE POI 3.9
>> > I had also tried with, 3.15 the same issue exist.
>> >
>> > System holds enough ram and disk space as per the requirement.
>> >
>> > 3.9 performance seems to better than other versions, Is there any way to
>> > rule this issue out in that version.?
>> >
>> > Anyone, who has idea on above scenario, kindly help me to proceed
>> further.
>> >
>> > Regards
>> > Thamodharan B
>> >
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Thamodharan Balraj
Hi Greg,
As you suggested, I had opened an issue at Bugzilla.
Bug 61832

Regards
Thamodharan B

On Nov 29, 2017 11:20 AM, "Thamodharan Balraj" <[hidden email]>
wrote:

Sure, will do that.


On Nov 29, 2017 4:05 AM, "Greg Woolsey" <[hidden email]> wrote:

> Can you open an issue at
>
> https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI
>
> with sample code and/or a sample workbook showing the problem?  That would
> help us reproduce it and track anything that requires a code change.
>
> On Tue, Nov 28, 2017 at 3:38 AM Thamodharan Balraj <
> [hidden email]> wrote:
>
>> Hi Greg Woolsey,
>>
>> I am not trying to put 5 million record in a sheet, I just tried with 0.5
>> million rows I.e (500,000 rows )with 150 column.
>>
>> As you also mentioned, I had refered the limits, I hadn't
>> crossed/violated any specification limits.
>>
>> Even I didn't apply any styles or format.
>> I just iterate and write a comman data of length 25 character in all the
>> cells.
>>
>> The workbook is done successful. But while opening it, I face the content
>> issue error. Please find the screenshot attached with the mail regarding to
>> it.
>>
>> Kindly help, if any one has idea on it.
>>
>> Regards
>> Thamodharan B
>>
>>
>>
>> On Nov 23, 2017 1:09 AM, "Greg Woolsey" <[hidden email]> wrote:
>>
>> The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you
>> are
>> trying to put 5 million rows on a single sheet, Excel will never open it.
>>
>> POI streaming may not catch it (it should, but maybe that's the bug), but
>> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider
>> it
>> corrupt.
>>
>> I've written out workbooks with < 1M rows in the past, without using the
>> streaming format even.  I've just had to notice when I reached the max #
>> on
>> a sheet and start a new sheet at that point.
>>
>> See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007
>> enum
>> for version specific limits.
>>
>> On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <
>> [hidden email]>
>> wrote:
>>
>> > Hi Everyone,
>> >
>> > I am trying to generate a excel with SXSSF workbook. The size of excel
>> may
>> > shoot to 200-300MB, whose row count will be 5,00,000 and column count
>> will
>> > be around 150 approximately..! I get content issue while opening such
>> large
>> > excel file.
>> > The version which I am using is APACHE POI 3.9
>> > I had also tried with, 3.15 the same issue exist.
>> >
>> > System holds enough ram and disk space as per the requirement.
>> >
>> > 3.9 performance seems to better than other versions, Is there any way to
>> > rule this issue out in that version.?
>> >
>> > Anyone, who has idea on above scenario, kindly help me to proceed
>> further.
>> >
>> > Regards
>> > Thamodharan B
>> >
>>
>>
>>
>>