[Bug 62872] New: Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 62872] New: Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

            Bug ID: 62872
           Summary: Writing large files with 800k rows gives
                    java.io.IOException: This archive contains unclosed
                    entries.
           Product: POI
           Version: 4.0.0-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SXSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 36225
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36225&action=edit
Sampel for java.io.IOException: This archive contains unclosed entries

The new 4.0 seems to have a problem when writing large XLSX files (e.g. >100k
rows but less than 1 Mill. rows) using SXSSFWorkbook:

java.io.IOException: This archive contains unclosed entries.
        at
org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.finish(ZipArchiveOutputStream.java:467)
        at
org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:406)
        at
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:936)

Please note, small files (e.g. less than 100k rows seem to work fine) and also
everything works fine with 3.18 (same code, same data).

The attached sample reproduces the error.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andreas@manticore-projects.
                   |                            |com

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #1 from [hidden email] ---
Created attachment 36226
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36226&action=edit
Test Case

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |regression

--- Comment #2 from Dominik Stadler <[hidden email]> ---
Based on discussion on user-list, this worked in 3.17.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #3 from Dominik Stadler <[hidden email]> ---
Reproducible with 600_000 rows, not reproducible with 400_000 rows.

Temporary file is aprox. 200MB, so not a 2GB issue

When the exception is thrown in ZipArchiveOutputStream, the variables have the
following contents:

this.finished = false
this.entry = {ZipArchiveOutputStream$CurrentEntry@2501}
 entry = {ZipArchiveEntry@2523} "xl/worksheets/sheet1.xml"
  method = 8
  size = 4497373312
  internalAttributes = 0
  versionRequired = 0
  versionMadeBy = 0
  platform = 0
  rawFlag = 0
  externalAttributes = 0
  alignment = 0
  extraFields = null
  unparseableExtra = null
  name = "xl/worksheets/sheet1.xml"
  rawName = null
  gpb = {GeneralPurposeBit@2525}
  localHeaderOffset = -1
  dataOffset = -1
  isStreamContiguous = false
  nameSource = {ZipArchiveEntry$NameSource@2526} "NAME"
  commentSource = {ZipArchiveEntry$CommentSource@2527} "COMMENT"
  ZipEntry.name = "xl/worksheets/sheet1.xml"
  xdostime = 276176132385
  mtime = null
  atime = null
  ctime = null
  crc = 2326399640
  ZipEntry.size = -1
  csize = 228388641
  ZipEntry.method = -1
  flag = 0
  extra = null
  comment = null
 localDataStart = 2340
 dataStart = 2380
 bytesRead = 4497373312
 causedUseOfZip64 = false
 hasWritten = false

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #4 from Andreas Beeker <[hidden email]> ---
My guess is, this is a shaded exception in SXSSFWorkbook.injectData.
If you add some checkpoint variables in, it will enter the "finally" for
"zos.closeArchiveEntry()", but actually doesn't finish it.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #5 from Andreas Beeker <[hidden email]> ---
It is a shaded exception complaining about the 4gb limit.
When using "zos.setUseZip64(Zip64Mode.Always)" the test runs through
successfully, but the result can't be opened in Libre Office

I'm now checking if the xml differs on 3.17 vs 4.0.0 or if it's caused by the
64bit zip stream.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #6 from Andreas Beeker <[hidden email]> ---
Patched via r1845629

The change now implicitly sets the stream entry to 64bit based on the given
filesize, the others stream entries stay in normal mode (... I guess ...)

I've compared the sheet content in 3.17 vs trunk, but there wasn't any
differences.

As this still produces files which can't be opened in Libre/MS Office, I leave
this issue open.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #7 from Andreas Beeker <[hidden email]> ---
Created attachment 36258
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36258&action=edit
modified test

I forgot to mention, that also with POI 3.17 I couldn't produce a file which
could be opened by Libre Office ... I haven't tried with MS Excel. This also
due the double "i++" in the test, which creates more than the 1048576 logical
rows with a ROW_COUNT of 600000.

Please run your fixed test again, with POI 3.17 vs. the current trunk.

For comparison of the unzipped xmls, you might want to use the same test data.
I've attached my modified version which inserts reproducible timestamps.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #8 from [hidden email] ---
LibreOffice does not seem to support ZIP64.

I have created 2 small files with the same content, but only one written as
ZIP64.

Gnumeric was able to open both files, LibreOffice fails with the ZIP64.
I have refreshed LibreOffice bug 82984 accordingly
(https://bugs.documentfoundation.org/show_bug.cgi?id=82984#c10).

Setting Zip64 to Always for archives exceeding 4GB and/or 65536 entries would
be the correct solution. When doing so, a warning related to the use of
LibreOffice would be great.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #9 from [hidden email] ---
Created attachment 36286
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36286&action=edit
AS NEEDED

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #10 from [hidden email] ---
Created attachment 36287
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36287&action=edit
ALWAYS

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

Andreas Beeker <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |57342


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=57342
[Bug 57342] Writing very large file via SXSSF leads to corrupt file
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #11 from PJ Fanning <[hidden email]> ---
I added a method to SXSSFWorkbook so that you can set the Zip64Mode -
https://svn.apache.org/viewvc?view=revision&revision=1848179

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]