DO NOT REPLY [Bug 35045] New: - Extracting text from word files fails

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

DO NOT REPLY [Bug 35045] New: - Extracting text from word files fails

Bugzilla from bugzilla@apache.org
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG?
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35045>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND?
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35045

           Summary: Extracting text from word files fails
           Product: POI
           Version: 2.5
          Platform: PC
        OS/Version: Windows 2000
            Status: NEW
          Severity: critical
          Priority: P2
         Component: POI Overall
        AssignedTo: [hidden email]
        ReportedBy: [hidden email]


Hello

I am trying to use poi to extract the text of some word documents with the
following code
StringWriter writer = new StringWriter();
WordDocument doc = new WordDocument("C:\\arj\\pdf\\peer.doc");
doc.openDoc();
doc.writeAllText(writer);
System.out.println(writer.toString());
some word files respond with the following exception
java.lang.NullPointerException
        at org.apache.poi.hdf.extractor.Utils.convertBytesToShort(Utils.java:47)
        at org.apache.poi.hdf.extractor.StyleSheet.doCHPOperation(StyleSheet.java:176)
        at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:685)
        at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:565)
        at
org.apache.poi.hdf.extractor.WordDocument.addParagraphContent(WordDocument.java:1050)
        at org.apache.poi.hdf.extractor.WordDocument.createParagraph(WordDocument.java:942)
        at org.apache.poi.hdf.extractor.WordDocument.addBlockContent(WordDocument.java:876)
        at org.apache.poi.hdf.extractor.WordDocument.writeSection(WordDocument.java:681)
        at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:211)
        at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:186)
        at zb.sts.text.WordTester.main(WordTester.java:27)
Exception in thread "main"


The text of other word files is not completely extracted

--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/