Can't get the DOM Node text representation of the markup

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Can't get the DOM Node text representation of the markup

Nadav Benedek
fix diff_iterator run line 325.
also check why the text extraction does show the new lines <w:br/>



I need to work with the underlying DOM nodes in order to do some manipulations to the .docx.
However, I can't find a way to get the info from a DOM Node element.

Let's say I have a run: run

If I take the underlying dom element: originalRun.getCTR().getDomNode()
I can't extract the representation.

However, if I use Factory.Parse, on some elements I can , and on others I can't
For example this works:
CTRPr.Factory.parse(run.getCTR().getDomNode().getChildNodes().item(0))

But on CTText it doesnt:
CTText.Factory.parse(run.getCTR().getDomNode().getChildNodes().item(1))

any idea how can I easily get all the data?

2. when I use dom nodes, I can easily traverse a Run, by doing: run.getCTR().getDomNode().getChildNodes(). Is there an option to do it in the POI Level, something like run.getAllElements/getAllChildren? Can't find it...


I am using POI 3.17

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can't get the DOM Node text representation of the markup

kiwiwings
Hi Nadav,

using the dom node usually results in some disconnected xml fragments.
When our schema doesn't cover current elements, I always use XmlCursor.
Have a look for instance at the SignatureLine class [1]
There you can see, how to add XmlObjects to a xml element having any content [2] and therefore no assessor methods are available.
XmlCursors can be used with XPath too, but usually it's faster to provide your own tree walker, when looking for child elements in the hierarchy.

Cheers,
Andi


[1] https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/poifs/crypt/dsig/SignatureLine.java?view=markup#l255
[2] https://svn.apache.org/viewvc/poi/trunk/src/ooxml/resources/org/apache/poi/schemas/vmlDrawing.xsd?view=markup#l30

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can't get the DOM Node text representation of the markup

Nadav Benedek
Thanks Andreas,
1. I didn't get your reply to my email inbox, only when I logged in again to https://lists.apache.org/ I saw it, any idea why the system doesnt send notifications to email?

2. I didn;t understand from your answer how can you get the list (with the right order) of XWPF elements found in a specific Run. I guess that XmlCursor can bring the XML elements, but how you get the higher XWPF abstraction.

3. Another wierd behaviour , from these lines of codes I get a fragment, but i dont see any <w:t> tag, do you know why?


        CTText tmpTextNode = CTText.Factory.newInstance();
        tmpTextNode.setStringValue("A");
        XmlOptions opts = new XmlOptions();
        opts.setSaveOuter();
        opts.setSaveNoXmlDecl();
        System.out.println(tmpTextNode.xmlText(opts));

On 2020/10/15 20:48:57, Andreas Beeker <[hidden email]> wrote:

> Hi Nadav,
>
> using the dom node usually results in some disconnected xml fragments.
> When our schema doesn't cover current elements, I always use XmlCursor.
> Have a look for instance at the SignatureLine class [1]
> There you can see, how to add XmlObjects to a xml element having any content [2] and therefore no assessor methods are available.
> XmlCursors can be used with XPath too, but usually it's faster to provide your own tree walker, when looking for child elements in the hierarchy.
>
> Cheers,
> Andi
>
>
> [1] https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/poifs/crypt/dsig/SignatureLine.java?view=markup#l255
> [2] https://svn.apache.org/viewvc/poi/trunk/src/ooxml/resources/org/apache/poi/schemas/vmlDrawing.xsd?view=markup#l30
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]