[Bug 61265] New: Unable to parse

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] New: Unable to parse

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

            Bug ID: 61265
           Summary: Unable to parse
           Product: POI
           Version: 3.16-FINAL
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HDF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35104
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35104&action=edit
unable to parse

The full exception stack trace is included below:

org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@547eb45
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
        at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
        at
org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:74)
        at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:357)
        at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:308)
        at
org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
        at
org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
        at javax.swing.TransferHandler.importData(Unknown Source)
        at javax.swing.TransferHandler$DropHandler.drop(Unknown Source)
        at java.awt.dnd.DropTarget.drop(Unknown Source)
        at javax.swing.TransferHandler$SwingDropTarget.drop(Unknown Source)
        at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(Unknown
Source)
        at
sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(Unknown
Source)
        at
sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(Unknown
Source)
        at sun.awt.dnd.SunDropTargetEvent.dispatch(Unknown Source)
        at java.awt.Component.dispatchEventImpl(Unknown Source)
        at java.awt.Container.dispatchEventImpl(Unknown Source)
        at java.awt.Component.dispatchEvent(Unknown Source)
        at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
        at java.awt.LightweightDispatcher.processDropTargetEvent(Unknown
Source)
        at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
        at java.awt.Container.dispatchEventImpl(Unknown Source)
        at java.awt.Window.dispatchEventImpl(Unknown Source)
        at java.awt.Component.dispatchEvent(Unknown Source)
        at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
        at java.awt.EventQueue.access$500(Unknown Source)
        at java.awt.EventQueue$3.run(Unknown Source)
        at java.awt.EventQueue$3.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown
Source)
        at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown
Source)
        at java.awt.EventQueue$4.run(Unknown Source)
        at java.awt.EventQueue$4.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown
Source)
        at java.awt.EventQueue.dispatchEvent(Unknown Source)
        at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at java.lang.System.arraycopy(Native Method)
        at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
        at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:288)
        at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:157)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:171)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 43 more

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] Unable to parse

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]
                 OS|                            |All

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] Unable to parse

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] Unable to parse

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

--- Comment #1 from PJ Fanning <[hidden email]> ---
That issue appears with latest poi code.
One possible workaround is to try docx.
I converted the file to docx and it parsed ok.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE wen reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

PJ Fanning <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Unable to parse             |AIOOBE wen reading doc file
                   |                            |(Section table parsing)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

PJ Fanning <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|AIOOBE wen reading doc file |AIOOBE when reading doc
                   |(Section table parsing)     |file (Section table
                   |                            |parsing)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

--- Comment #2 from Javen O'Neal <[hidden email]> ---
Does POI choke if you re-save as doc in Word? Sometimes different versions of
Word or other office software produce different files.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|HDF                         |HWPF

--- Comment #3 from Javen O'Neal <[hidden email]> ---
L

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #4 from [hidden email] ---
Converting to docx does not seem to be the fix as the file conversion process
changes the file meta data.

Is there is a way to update POI to handle the original doc file?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

--- Comment #5 from PJ Fanning <[hidden email]> ---
I converted the doc file using MS Word. I don't think that POI can be used to
convert the file right now.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

--- Comment #6 from PJ Fanning <[hidden email]> ---
Using MS Word to resave as doc or docx makes the file parseable in POI.
I used MS Word on a Mac (Word v15.29).

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61265] AIOOBE when reading doc file (Section table parsing)

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61265

--- Comment #7 from [hidden email] ---
HOw to resave the file from doc to docx while retaining the original file meta
data (especially content creation date)?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...