[Bug 58067] New: getText() of XWPFParagraph returns deleted text if in "review" mode

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 58067] New: getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

            Bug ID: 58067
           Summary: getText() of XWPFParagraph returns deleted text if in
                    "review" mode
           Product: POI
           Version: unspecified
          Hardware: Macintosh
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
          Assignee: [hidden email]
          Reporter: [hidden email]

Created attachment 32843
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32843&action=edit
A test file to reproduce the problem with

Dear all,

I’m looking for a simple solution to parse only the newest version of an XWPF
file (as if all changes are accepted or so). As far as I could google and
browse through the javadoc there is no such functionality in apache poi, is
that correct?
I.e.:
- Open a MS Word document
- Track changes
- Remove text from the document (in tracked-mode)
- Save. (see file attached)

- Open file with apache POI
- iterate through paragraphs
- call getText() on the paragraphs

Outcome: The removed text is returned.
Expected: Only text of the "final version" of the document is returned.

Best,
Henning

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

--- Comment #1 from [hidden email] ---
Created attachment 32844
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32844&action=edit
Patch

Here is a patch, that checks if there is a deletion item associated with a run,
before it adds the text. I'm not sure which other items could contain such a
deletion, so I just checked for XWPFRuns.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #2 from [hidden email] ---
The fix is a simple check:

 if (run instanceof XWPFRun) {
+                XWPFRun xRun = (XWPFRun) run;
+                if (xRun.getCTR().getRsidDel() == null) {
+                    out.append(xRun.toString());
+                }
+            }

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

--- Comment #3 from Dominik Stadler <[hidden email]> ---
Here is the output:

bffvalidator c:\temp\58061good.xls
BFFValidator: "c:\temp\58061good.xls" FAILED at 06/22/15 16:42:09
Log at: c:\temp\58061good.xls.bffvalidator.06-22-15_16-42-09.xml
See:
http://msdn.microsoft.com/en-us/library/A6FFF2B4-470A-463D-A6E9-9DAD9676CD44
for more information


bffvalidator c:\temp\58061corrupt.xls
BFFValidator: "c:\temp\58061corrupt.xls" NOT RECOGNIZED (The Microsoft Office
Binary File Format Validator encountered an error reading the file you
specified, OR The Microsof
t Office Binary File Format Validator supports Word, Excel, and PowerPoint
binary file formats only. The file you specified is an unsupported file type.)
at 06/22/15 16:42:14
Log at: c:\temp\58061corrupt.xls.bffvalidator.06-22-15_16-42-14.xml

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

--- Comment #4 from Dominik Stadler <[hidden email]> ---
sorry, wrong bug!

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #5 from Dominik Stadler <[hidden email]> ---
Thanks for the patch, this is now applied via r1722715

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=58067

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |61787


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=61787
[Bug 61787] Text extraction omitting text incorrectly
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]