Paragraph runs are wrongly considered to be deleted in review

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Paragraph runs are wrongly considered to be deleted in review

Simon Gaeremynck
Hello all,

I think this is a bug but wanted to run it past the email list first before I open a bug ticket.

I have a document that contains paragraphs and runs that all have zeroed out rsidDel attributes. [1]
I’m having a bit of trouble trying to find the right reference documentation to help discern what the expected behavior should be, but from just eyeballing that XML, it looks like w:rsidRDefault declares the value 00000000 to be the default ID and therefore ignorable?

When doing a paragraph.getText() the runs are filtered out as they have an rsidDel attribute.
https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFParagraph.java#L192

This was done as a fix for https://bz.apache.org/bugzilla/show_bug.cgi?id=58067 (getText() of XWPFParagraph returns deleted text if in "review" mode) but I wonder whether the right behavior is to compare the value of rsidDel to rsidRDefault (or filter out 00000000 values)?

You can find the source .docx file at https://s3.amazonaws.com/ally-dev/files/essay.docx

Let me know what you think.

Thanks!
Simon


[1] Example of paragraph runs
<w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000">
    <w:pPr>
        <w:pStyle w:val="Title"/>
        <w:contextualSpacing w:val="0"/>
        <w:rPr/>
    </w:pPr>
    <w:bookmarkStart w:colFirst="0" w:colLast="0" w:id="0" w:name="_u0zbcgllb07d"/>
    <w:bookmarkEnd w:id="0"/>
    <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
        <w:rPr>
            <w:rtl w:val="0"/>
        </w:rPr>
        <w:t xml:space="preserve">Personal Worldview Essay</w:t>
    </w:r>
</w:p>
<w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000">
    <w:pPr>
        <w:contextualSpacing w:val="0"/>
        <w:rPr/>
    </w:pPr>
    <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
        <w:rPr>
            <w:rtl w:val="0"/>
        </w:rPr>
    </w:r>
</w:p>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Paragraph runs are wrongly considered to be deleted in review

Mark Murphy
Is POI producing this? What happens if you display the file with Word?

On Thu, Nov 16, 2017 at 10:47 AM, Simon Gaeremynck <[hidden email]>
wrote:

> Hello all,
>
> I think this is a bug but wanted to run it past the email list first
> before I open a bug ticket.
>
> I have a document that contains paragraphs and runs that all have zeroed
> out rsidDel attributes. [1]
> I’m having a bit of trouble trying to find the right reference
> documentation to help discern what the expected behavior should be, but
> from just eyeballing that XML, it looks like w:rsidRDefault declares the
> value 00000000 to be the default ID and therefore ignorable?
>
> When doing a paragraph.getText() the runs are filtered out as they have an
> rsidDel attribute.
> https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/
> apache/poi/xwpf/usermodel/XWPFParagraph.java#L192
>
> This was done as a fix for https://bz.apache.org/
> bugzilla/show_bug.cgi?id=58067 (getText() of XWPFParagraph returns
> deleted text if in "review" mode) but I wonder whether the right behavior
> is to compare the value of rsidDel to rsidRDefault (or filter out 00000000
> values)?
>
> You can find the source .docx file at https://s3.amazonaws.com/ally-
> dev/files/essay.docx
>
> Let me know what you think.
>
> Thanks!
> Simon
>
>
> [1] Example of paragraph runs
> <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
> w:rsidRDefault="00000000" w:rsidRPr="00000000">
>     <w:pPr>
>         <w:pStyle w:val="Title"/>
>         <w:contextualSpacing w:val="0"/>
>         <w:rPr/>
>     </w:pPr>
>     <w:bookmarkStart w:colFirst="0" w:colLast="0" w:id="0"
> w:name="_u0zbcgllb07d"/>
>     <w:bookmarkEnd w:id="0"/>
>     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
>         <w:rPr>
>             <w:rtl w:val="0"/>
>         </w:rPr>
>         <w:t xml:space="preserve">Personal Worldview Essay</w:t>
>     </w:r>
> </w:p>
> <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
> w:rsidRDefault="00000000" w:rsidRPr="00000000">
>     <w:pPr>
>         <w:contextualSpacing w:val="0"/>
>         <w:rPr/>
>     </w:pPr>
>     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
>         <w:rPr>
>             <w:rtl w:val="0"/>
>         </w:rPr>
>     </w:r>
> </w:p>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Paragraph runs are wrongly considered to be deleted in review

Simon Gaeremynck
Hi Mark,

The file was not produced in POI but authored by another user (presumably in Word or another office editor).

All content displays fine in Word. When checking the review stats in Word there are 0 deletions, in fact, the file hasn’t been reviewed at all.

Thanks!
Simon

On 11/19/17, 2:19 AM, "Mark Murphy" <[hidden email]> wrote:

    Is POI producing this? What happens if you display the file with Word?
   
    On Thu, Nov 16, 2017 at 10:47 AM, Simon Gaeremynck <[hidden email]>
    wrote:
   
    > Hello all,
    >
    > I think this is a bug but wanted to run it past the email list first
    > before I open a bug ticket.
    >
    > I have a document that contains paragraphs and runs that all have zeroed
    > out rsidDel attributes. [1]
    > I’m having a bit of trouble trying to find the right reference
    > documentation to help discern what the expected behavior should be, but
    > from just eyeballing that XML, it looks like w:rsidRDefault declares the
    > value 00000000 to be the default ID and therefore ignorable?
    >
    > When doing a paragraph.getText() the runs are filtered out as they have an
    > rsidDel attribute.
    > https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/
    > apache/poi/xwpf/usermodel/XWPFParagraph.java#L192
    >
    > This was done as a fix for https://bz.apache.org/
    > bugzilla/show_bug.cgi?id=58067 (getText() of XWPFParagraph returns
    > deleted text if in "review" mode) but I wonder whether the right behavior
    > is to compare the value of rsidDel to rsidRDefault (or filter out 00000000
    > values)?
    >
    > You can find the source .docx file at https://s3.amazonaws.com/ally-
    > dev/files/essay.docx
    >
    > Let me know what you think.
    >
    > Thanks!
    > Simon
    >
    >
    > [1] Example of paragraph runs
    > <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
    > w:rsidRDefault="00000000" w:rsidRPr="00000000">
    >     <w:pPr>
    >         <w:pStyle w:val="Title"/>
    >         <w:contextualSpacing w:val="0"/>
    >         <w:rPr/>
    >     </w:pPr>
    >     <w:bookmarkStart w:colFirst="0" w:colLast="0" w:id="0"
    > w:name="_u0zbcgllb07d"/>
    >     <w:bookmarkEnd w:id="0"/>
    >     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
    >         <w:rPr>
    >             <w:rtl w:val="0"/>
    >         </w:rPr>
    >         <w:t xml:space="preserve">Personal Worldview Essay</w:t>
    >     </w:r>
    > </w:p>
    > <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
    > w:rsidRDefault="00000000" w:rsidRPr="00000000">
    >     <w:pPr>
    >         <w:contextualSpacing w:val="0"/>
    >         <w:rPr/>
    >     </w:pPr>
    >     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
    >         <w:rPr>
    >             <w:rtl w:val="0"/>
    >         </w:rPr>
    >     </w:r>
    > </w:p>
    >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >
   



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Paragraph runs are wrongly considered to be deleted in review

Murphy, Mark
It appears, based on my reading of the spec, that the fix for bug 58067 is incorrect as it is keying on rsidDel which is a revision session id not a deleted element mark. Deletions are indicated by several tags but most important to this particular bug is <delText> which is the tag for deleted text in a run. I will create a bug for you.

-----Original Message-----
From: Simon Gaeremynck [mailto:[hidden email]]
Sent: Sunday, November 19, 2017 7:23 AM
To: POI Users List <[hidden email]>
Subject: Re: Paragraph runs are wrongly considered to be deleted in review

Hi Mark,

The file was not produced in POI but authored by another user (presumably in Word or another office editor).

All content displays fine in Word. When checking the review stats in Word there are 0 deletions, in fact, the file hasn’t been reviewed at all.

Thanks!
Simon

On 11/19/17, 2:19 AM, "Mark Murphy" <[hidden email]> wrote:

    Is POI producing this? What happens if you display the file with Word?
   
    On Thu, Nov 16, 2017 at 10:47 AM, Simon Gaeremynck <[hidden email]>
    wrote:
   
    > Hello all,
    >
    > I think this is a bug but wanted to run it past the email list first
    > before I open a bug ticket.
    >
    > I have a document that contains paragraphs and runs that all have zeroed
    > out rsidDel attributes. [1]
    > I’m having a bit of trouble trying to find the right reference
    > documentation to help discern what the expected behavior should be, but
    > from just eyeballing that XML, it looks like w:rsidRDefault declares the
    > value 00000000 to be the default ID and therefore ignorable?
    >
    > When doing a paragraph.getText() the runs are filtered out as they have an
    > rsidDel attribute.
    > https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/
    > apache/poi/xwpf/usermodel/XWPFParagraph.java#L192
    >
    > This was done as a fix for https://bz.apache.org/
    > bugzilla/show_bug.cgi?id=58067 (getText() of XWPFParagraph returns
    > deleted text if in "review" mode) but I wonder whether the right behavior
    > is to compare the value of rsidDel to rsidRDefault (or filter out 00000000
    > values)?
    >
    > You can find the source .docx file at https://s3.amazonaws.com/ally-
    > dev/files/essay.docx
    >
    > Let me know what you think.
    >
    > Thanks!
    > Simon
    >
    >
    > [1] Example of paragraph runs
    > <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
    > w:rsidRDefault="00000000" w:rsidRPr="00000000">
    >     <w:pPr>
    >         <w:pStyle w:val="Title"/>
    >         <w:contextualSpacing w:val="0"/>
    >         <w:rPr/>
    >     </w:pPr>
    >     <w:bookmarkStart w:colFirst="0" w:colLast="0" w:id="0"
    > w:name="_u0zbcgllb07d"/>
    >     <w:bookmarkEnd w:id="0"/>
    >     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
    >         <w:rPr>
    >             <w:rtl w:val="0"/>
    >         </w:rPr>
    >         <w:t xml:space="preserve">Personal Worldview Essay</w:t>
    >     </w:r>
    > </w:p>
    > <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
    > w:rsidRDefault="00000000" w:rsidRPr="00000000">
    >     <w:pPr>
    >         <w:contextualSpacing w:val="0"/>
    >         <w:rPr/>
    >     </w:pPr>
    >     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
    >         <w:rPr>
    >             <w:rtl w:val="0"/>
    >         </w:rPr>
    >     </w:r>
    > </w:p>
    >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >
   



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Paragraph runs are wrongly considered to be deleted in review

Murphy, Mark
Simon, if you could attach a simple document that exhibits this issue to bugzilla issue #61787 (https://bz.apache.org/bugzilla/show_bug.cgi?id=61787), I would appreciate it.

-----Original Message-----
From: Murphy, Mark [mailto:[hidden email]]
Sent: Monday, November 20, 2017 8:45 AM
To: 'POI Users List' <[hidden email]>
Subject: RE: Paragraph runs are wrongly considered to be deleted in review

It appears, based on my reading of the spec, that the fix for bug 58067 is incorrect as it is keying on rsidDel which is a revision session id not a deleted element mark. Deletions are indicated by several tags but most important to this particular bug is <delText> which is the tag for deleted text in a run. I will create a bug for you.

-----Original Message-----
From: Simon Gaeremynck [mailto:[hidden email]]
Sent: Sunday, November 19, 2017 7:23 AM
To: POI Users List <[hidden email]>
Subject: Re: Paragraph runs are wrongly considered to be deleted in review

Hi Mark,

The file was not produced in POI but authored by another user (presumably in Word or another office editor).

All content displays fine in Word. When checking the review stats in Word there are 0 deletions, in fact, the file hasn’t been reviewed at all.

Thanks!
Simon

On 11/19/17, 2:19 AM, "Mark Murphy" <[hidden email]> wrote:

    Is POI producing this? What happens if you display the file with Word?
   
    On Thu, Nov 16, 2017 at 10:47 AM, Simon Gaeremynck <[hidden email]>
    wrote:
   
    > Hello all,
    >
    > I think this is a bug but wanted to run it past the email list first
    > before I open a bug ticket.
    >
    > I have a document that contains paragraphs and runs that all have zeroed
    > out rsidDel attributes. [1]
    > I’m having a bit of trouble trying to find the right reference
    > documentation to help discern what the expected behavior should be, but
    > from just eyeballing that XML, it looks like w:rsidRDefault declares the
    > value 00000000 to be the default ID and therefore ignorable?
    >
    > When doing a paragraph.getText() the runs are filtered out as they have an
    > rsidDel attribute.
    > https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/
    > apache/poi/xwpf/usermodel/XWPFParagraph.java#L192
    >
    > This was done as a fix for https://bz.apache.org/
    > bugzilla/show_bug.cgi?id=58067 (getText() of XWPFParagraph returns
    > deleted text if in "review" mode) but I wonder whether the right behavior
    > is to compare the value of rsidDel to rsidRDefault (or filter out 00000000
    > values)?
    >
    > You can find the source .docx file at https://s3.amazonaws.com/ally-
    > dev/files/essay.docx
    >
    > Let me know what you think.
    >
    > Thanks!
    > Simon
    >
    >
    > [1] Example of paragraph runs
    > <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
    > w:rsidRDefault="00000000" w:rsidRPr="00000000">
    >     <w:pPr>
    >         <w:pStyle w:val="Title"/>
    >         <w:contextualSpacing w:val="0"/>
    >         <w:rPr/>
    >     </w:pPr>
    >     <w:bookmarkStart w:colFirst="0" w:colLast="0" w:id="0"
    > w:name="_u0zbcgllb07d"/>
    >     <w:bookmarkEnd w:id="0"/>
    >     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
    >         <w:rPr>
    >             <w:rtl w:val="0"/>
    >         </w:rPr>
    >         <w:t xml:space="preserve">Personal Worldview Essay</w:t>
    >     </w:r>
    > </w:p>
    > <w:p w:rsidDel="00000000" w:rsidP="00000000" w:rsidR="00000000"
    > w:rsidRDefault="00000000" w:rsidRPr="00000000">
    >     <w:pPr>
    >         <w:contextualSpacing w:val="0"/>
    >         <w:rPr/>
    >     </w:pPr>
    >     <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
    >         <w:rPr>
    >             <w:rtl w:val="0"/>
    >         </w:rPr>
    >     </w:r>
    > </w:p>
    >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >
   



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]

B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�KK[XZ[
�\�\�][��X��ܚX�P�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
�\�\�Z[�K�\X�K�ܙ�B

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]