[Bug 61295] New: Vector.read -- Java heap space on corrupt file

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61295] New: Vector.read -- Java heap space on corrupt file

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61295

            Bug ID: 61295
           Summary: Vector.read -- Java heap space on corrupt file
           Product: POI
           Version: 3.16-FINAL
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HPSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35128
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35128&action=edit
triggering file

I started experimenting with randomly corrupting files based on feedback from
Luis Filipe Nassif [1].  The attached file triggers this:

java.lang.OutOfMemoryError: Java heap space

        at org.apache.poi.hpsf.Vector.read(Vector.java:43)
        at
org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:219)
        at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:174)
        at org.apache.poi.hpsf.Property.<init>(Property.java:179)
        at org.apache.poi.hpsf.MutableProperty.<init>(MutableProperty.java:53)
        at org.apache.poi.hpsf.Section.<init>(Section.java:237)
        at org.apache.poi.hpsf.MutableSection.<init>(MutableSection.java:41)
        at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:494)
        at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:196)
        at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)
        at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:155)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)


[1]
https://issues.apache.org/jira/browse/TIKA-2428?focusedCommentId=16086045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16086045

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61295] Vector.read -- Java heap space on corrupt file

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61295

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61295] Vector.read -- Java heap space on corrupt file

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61295

--- Comment #1 from Tim Allison <[hidden email]> ---
The actual Vector size that is causing an OOM in Tika is 1,358,954,497 on one
triggering file.  We could arbitrarily set a max_value << Integer.MAX_VALUE, or
we could use a list and then convert that to an array.  If we do the latter,
and there is a corrupt size value, the LittleEndianInputStream will throw an
exception when asked to read beyond what is available in the stream.

I somewhat prefer the second option.  Commit on way...

Happy to go with the first or open to other options...

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61295] Vector.read -- Java heap space on corrupt file

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61295

Tim Allison <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Tim Allison <[hidden email]> ---
r1802879

I didn't add a test file because I didn't think the test was worth 65kb.

I can look for a shorter triggering file if necessary.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...