[Bug 61911] New: ArrayIndexOutOfBoundsException when processing certain .doc files

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61911] New: ArrayIndexOutOfBoundsException when processing certain .doc files

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61911

            Bug ID: 61911
           Summary: ArrayIndexOutOfBoundsException when processing certain
                    .doc files
           Product: POI
           Version: 3.17-FINAL
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35616
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35616&action=edit
This File should reproduce the issue

When Solr (7.1.0) is trying to parse this .doc file we get following exception:
Seems to be related to an older form of .doc files because converting the .doc
to a .docx and then back to a .doc fixes this issue.

{
  "responseHeader":{
    "status":500,
    "QTime":265},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
    "msg":"org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@20395b83",
    "trace":"org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)\r\n\tat
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\r\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\r\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat
java.lang.Thread.run(Unknown Source)\r\nCaused by:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)\r\n\tat
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)\r\n\tat
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)\r\n\t...
34 more\r\nCaused by: java.lang.ArrayIndexOutOfBoundsException: -1\r\n\tat
org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:329)\r\n\tat
org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:74)\r\n\tat
org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:100)\r\n\tat
org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:727)\r\n\tat
org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:227)\r\n\tat
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:712)\r\n\tat
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:702)\r\n\tat
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:174)\r\n\tat
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)\r\n\tat
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\t...
38 more\r\n",
    "code":500}}

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61911] ArrayIndexOutOfBoundsException when processing certain .doc files

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61911

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61911] ArrayIndexOutOfBoundsException when processing certain .doc files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61911

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #1 from Dominik Stadler <[hidden email]> ---
Fixed with r1819403 by adding more checks for invalid indices.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]