[Bug 61475] New: Duplication of content in some XWPF

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61475] New: Duplication of content in some XWPF

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61475

            Bug ID: 61475
           Summary: Duplication of content in some XWPF
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 35274
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35274&action=edit
example docx

In regression tests for 3.17-rc2, I found some duplication of content in Tika,
and this is replicated with POI's XWPFWordExtractor.

        XWPFDocument doc =
XWPFTestDataSamples.openSampleDocument("dupe1.docx");
        XWPFWordExtractor extractor = new XWPFWordExtractor(doc);

In the attached file, "When readers open..." should only appear once, but it
appears twice.

Full reports are here:
http://162.242.228.174/reports/poi-3.17-rc2-docx.tar.gz

Roughly ~8000 docxs have apparently at least some duplicated content out of
~170k.  Some of the extra content can be explained by the phonetic/ruby issue,
but not the majority.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61475] Duplication of content in some XWPF

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61475

Tim Allison <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Tim Allison <[hidden email]> ---
My fault on 61740.

The appending of the picture text slipped into the loop instead of being
applied after it.

1123            // Any picture text?
1124            if (pictureText != null && pictureText.length() > 0) {
1125                text.append("\n").append(pictureText);
1126            }

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61475] Duplication of content in some XWPF

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61475

Tim Allison <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Tim Allison <[hidden email]> ---
r1806839

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]