[Bug 61275] New: Excel could not open <file> because some content is unreadable

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61275] New: Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

            Bug ID: 61275
           Summary: Excel could not open <file> because some content is
                    unreadable
           Product: POI
           Version: 3.15-FINAL
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35110
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35110&action=edit
corrupt xlsx file

Hi, if I generate an xlsx file using a war deployed on Tomcat, I do not get an
error.  When I generate the same xlsx file using a war deployed on WebSphere I
get the above error.  I am attributing the difference to a difference in
implementation of the zip functionality between Oracle JVM and IBM JVM (so
perhaps this bug should go to IBM).

If I try to open the corrupt version, it gives me the message, and I can choose
to fix it, whereupon it opens.  However, the excel log does not give me
information about what was fixed.

There are slight differences between a couple of files in the resulting diffs.  
Here are the differences in the output of zipinfo:

Good xlsx:

gloucester-pc:Downloads cbuxbaum$ zipinfo Results\ \(13\).xlsx
Archive:  Results (13).xlsx   6704 bytes   11 files
-rw----     2.0 fat      598 bl defN 10-Jul-17 13:43 _rels/.rels
-rw----     2.0 fat     1197 bl defN 10-Jul-17 13:43 [Content_Types].xml
-rw----     2.0 fat      184 bl defN 10-Jul-17 13:43 docProps/app.xml
-rw----     2.0 fat      443 bl defN 10-Jul-17 13:43 docProps/core.xml
-rw----     2.0 fat      131 bl defN 10-Jul-17 13:43 xl/drawings/drawing1.xml
-rw----     2.0 fat      138 bl defN 10-Jul-17 13:43 xl/sharedStrings.xml
-rw----     2.0 fat     3601 bl defN 10-Jul-17 13:43 xl/styles.xml
-rw----     2.0 fat      350 bl defN 10-Jul-17 13:43 xl/workbook.xml
-rw----     2.0 fat      576 bl defN 10-Jul-17 13:43 xl/_rels/workbook.xml.rels
-rw----     2.0 fat    22610 bl defN 10-Jul-17 13:43 xl/worksheets/sheet1.xml
-rw----     2.0 fat      305 bl defN 10-Jul-17 13:43
xl/worksheets/_rels/sheet1.xml.rels
11 files, 30133 bytes uncompressed, 5230 bytes compressed:  82.6%

Bad xlsx:

gloucester-pc:Downloads cbuxbaum$ zipinfo Results\ \(15\).xlsx
Archive:  Results (15).xlsx   6745 bytes   11 files
-rw----     2.0 fat      596 bl defN 10-Jul-17 13:52 _rels/.rels
-rw----     2.0 fat     1195 bl defN 10-Jul-17 13:52 [Content_Types].xml
-rw----     2.0 fat      184 bl defN 10-Jul-17 13:52 docProps/app.xml
-rw----     2.0 fat      441 bl defN 10-Jul-17 13:52 docProps/core.xml
-rw----     2.0 fat      131 bl defN 10-Jul-17 13:52 xl/drawings/drawing1.xml
-rw----     2.0 fat      138 bl defN 10-Jul-17 13:52 xl/sharedStrings.xml
-rw----     2.0 fat     3601 bl defN 10-Jul-17 13:52 xl/styles.xml
-rw----     2.0 fat      350 bl defN 10-Jul-17 13:52 xl/workbook.xml
-rw----     2.0 fat      574 bl defN 10-Jul-17 13:52 xl/_rels/workbook.xml.rels
-rw----     2.0 fat    22610 bl defN 10-Jul-17 13:52 xl/worksheets/sheet1.xml
-rw----     2.0 fat      303 bl defN 10-Jul-17 13:52
xl/worksheets/_rels/sheet1.xml.rels

tail of od command on _rels/.rels:

good version:

0001100    037057  005015  027474  062522  060554  064564  067157  064163
0001120    070151  037163  005015                                        
0001126

bad version:

0001100    037057  005015  027474  062522  060554  064564  067157  064163
0001120    070151  037163                                                
0001124

I am attaching the following:

Our Excel struts action
The Excel Helper class that builds the excel

The bad(corrupt) xlsx
The good xlsx

Thanks!

Carl Buxbaum

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #1 from Carl Buxbaum <[hidden email]> ---
Created attachment 35111
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35111&action=edit
working xlsx file, generated with Tomcat Oracle JVM 1.8

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

Carl Buxbaum <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35110|corrupt xlsx file           |corrupt xlsx file generated
        description|                            |with Websphere IBM JVM 1.8

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #2 from Carl Buxbaum <[hidden email]> ---
Created attachment 35112
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35112&action=edit
struts action for building excel

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #3 from Carl Buxbaum <[hidden email]> ---
Created attachment 35113
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35113&action=edit
Helper class for building xlsx

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

Nick Burch <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #4 from Nick Burch <[hidden email]> ---
Assuming you have an active IBM support contract for your troublesome websphere
install, I'd suggest punting it over to them. They'll have everything setup to
reproduce it, which I'm not sure any of us do here, and they'll be much more
experienced at debugging websphere + ibm jvm issues!

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #5 from Carl Buxbaum <[hidden email]> ---
Hi Nick,

This issue originated from a customer of ours, so I would say that it's not
a particular WebSphere install that is the problem.  I would reckon that
any xlsx built by poi using the IBM JVM will exhibit the same problem, so
if there is something that poi can do to help create a compatible zip,
then it may be useful for someone in the poi project to take a look.


Thanks,

Carl

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #6 from Javen O'Neal <[hidden email]> ---
(In reply to Nick Burch from comment #4)
> I'm not sure any of us [have Websphere IBM Java set up to reproduce this], and
> they'll be much more experienced at debugging websphere + ibm jvm issues!

Here's are automated builds, for which we do not have an enabled job that runs
on an IBM JDK, let alone on Webpphere.
https://builds.apache.org/view/P/view/POI/

It's difficult to fix something without being able to reproduce or test it.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

Andreas Beeker <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Hardware|PC                          |Macintosh
          Component|HSSF                        |XSSF

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #7 from Andreas Beeker <[hidden email]> ---
Would you mind trying it with the trunk version?
... I've deactivated indenting after the beta came out - if it's a newline
problem, this might be solved by that.

Furthermore I'm not sure if that comment is relevant or true, but regarding
[1]:
"NOTE: If you are using a unix system, be aware of linebreaks, the OPC uses
CRLF not LF."


[1] https://stackoverflow.com/questions/36063375

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #8 from Carl Buxbaum <[hidden email]> ---
Hi, I put Windows as hardware because the issue occurs when the document is
generated and served by a WebSphere instance running under IBM JVM 1.8, not
when served by my local, which happens to be a mac.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

Stefan Bodewig <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #9 from Stefan Bodewig <[hidden email]> ---
Created attachment 35114
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35114&action=edit
content diff

The only content differences are missing new lines at the end of a couple of
XML files, this shouldn't throw off an XML parser but if the format is
sensitive to line-ends, this may well become a problem.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #10 from Andreas Beeker <[hidden email]> ---
(In reply to Stefan Bodewig from comment #9)
> The only content differences are missing new lines at the end of a couple of
> XML files, this shouldn't throw off an XML parser but if the format is
> sensitive to line-ends, this may well become a problem.

There's an info missing, which was in Carls email thread - so the xml is
probably not the culprit:

> If I create a "corrupt" excel, unzip it, and then zip it back up (on my
> mac using zip command), the resulting zip file opens without issue.
> If a colleague on Windows generates the same excel, and does the same
> probably using windup, the corruption remains.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #11 from Mark Murphy <[hidden email]> ---
I use the IBM JVM v1.6.26 on IBM i, no Websphere. It has no issues with POI
3.14. I can create valid XLSX documents.

(In reply to Carl Buxbaum from comment #5)

> Hi Nick,
>
> This issue originated from a customer of ours, so I would say that it's not
> a particular WebSphere install that is the problem.  I would reckon that
> any xlsx built by poi using the IBM JVM will exhibit the same problem, so
> if there is something that poi can do to help create a compatible zip,
> then it may be useful for someone in the poi project to take a look.
>
>
> Thanks,
>
> Carl

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #12 from Stefan Bodewig <[hidden email]> ---
I started looking through the archives in a hex-editor and the first local file
headers looked fine, so I skipped to the end.

While Results_good.xlsx ends with the "end of central directory" record one
would expect, the Results_bad.xlsx contains some garbage after it, namely

Error 500: java.lang.IllegalStateException\r\n

so it seems some log output has made it into the stream.

Usually ZIP archivers will read an archive from the back looking for the EOCD
record (or its ZIP64 cousin) and ignore garbage at the end. It is not uncommon
to find code for self-extracting archives there. This is why zip and friends
won't complain about the archive, Excel seems to be more picky.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #13 from Stefan Bodewig <[hidden email]> ---
Just to make sure it really is the garbage at the end you could perform

head --bytes=-44 < Results_bad.xlsx > Results.xlsx

and try to feed Results.xlsx to Excel. I've only got Libre Office installed
which accepts the "corrupt" sheet without any warning, so I cannot check
myself.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #14 from Stefan Bodewig <[hidden email]> ---
Created attachment 35121
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35121&action=edit
potentially "repaired" Results_bad.xlsx

I realized my command may have been GNU head specific, so uploaded the result
directly.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #15 from Carl Buxbaum <[hidden email]> ---
(In reply to Stefan Bodewig from comment #14)
> Created attachment 35121 [details]
> potentially "repaired" Results_bad.xlsx
>
> I realized my command may have been GNU head specific, so uploaded the
> result directly.

Thank you so much!  I don't know why I did not look directly at the excel file
in an editor.

I did edit out the IllegalArgumentException and it still appears corrupt, but I
am mystified as to how that Exception would get in their n the first place.  I
do not see anything being thrown in the logs.

Thanks,

Carl

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #16 from Dominik Stadler <[hidden email]> ---
The latest "repaired Results_bad.xlsx" opens for me in Excel without any
corruption warning, so I don't see what we can do here from our end.

If you still think there is a problem in POI please try to provide a more
self-sufficient and minimal unit test which produces the corrupt file. The
current code is intertwined with Apache Struts code and other things that are
not related to the problem at all.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61275] Excel could not open <file> because some content is unreadable

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61275

--- Comment #17 from Carl Buxbaum <[hidden email]> ---
(In reply to Dominik Stadler from comment #16)
> The latest "repaired Results_bad.xlsx" opens for me in Excel without any
> corruption warning, so I don't see what we can do here from our end.
>
> If you still think there is a problem in POI please try to provide a more
> self-sufficient and minimal unit test which produces the corrupt file. The
> current code is intertwined with Apache Struts code and other things that
> are not related to the problem at all.

Hi Dominik et. al.,

I did finally discover what was causing this.  Essentially doing a redirect
instead of a forward fixes it in our struts application:

The jsp uses JSPWriter to create the jsp page, and the struts action uses the
OutputStream from the response to stream the excel. It is not permissible for
both to be used in the same request, and the error generated results in a
message being appended to the OutputStream( and therefore to the excel
spreadsheet, corrupting it). Therefore, the fix is to do a redirect to the
struts action, which creates a new request that only handles the response
OutputStream, instead of forwarding to the struts action, which handles the
streaming of the excel in the same request as the jsp. I don't know why this
only manifests in WebSphere, and Tomcat seems to not have this issue.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]