[VOTE] Apache POI 4.0.1 release (RC1)

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[VOTE] Apache POI 4.0.1 release (RC1)

kiwiwings
Hi,

I've prepared artifacts for the release of Apache POI 4.0.1 (RC1).

The most notable changes in this release are:

- dependency updates to XMLBeans 3.0.2 / Bouncycastle 1.60
- XSSF: import chart on drawing
- XDDF: Define XDDF user model for text body, its paragraphs and text runs
- OPC: fixes on the newly introduced commons compress usage

https://dist.apache.org/repos/dist/dev/poi/4.0.1-RC1/

All tests pass. ASC files verify and SHA* are correct.
There's no clutter in the src/bin packages.

There's a new OOXMLLite mechanism, please also check poi-ooxml-schemas.

Please vote to release the artifacts.
The vote keeps open until we have have feedback from TIKA and a consent on the govdocs results.

Planned release announcement date is 2018-11-26.

Here is my +1

Andi


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
Andi,
  Thank you!  I've built this locally and integrated it into Tika, and
I've kicked off the regression tests.  The one small glitch I noticed
so far is that poi-ooxml-schemas jar has an extra ".jar" in it:
build/dist/maven/poi-ooxml-schemas/poi-ooxml-schemas-4.0.1.jar.jar
   I'll let you all know when I have results.

              Best,

                     Tim
On Mon, Nov 19, 2018 at 6:47 PM Andreas Beeker <[hidden email]> wrote:

>
> Hi,
>
> I've prepared artifacts for the release of Apache POI 4.0.1 (RC1).
>
> The most notable changes in this release are:
>
> - dependency updates to XMLBeans 3.0.2 / Bouncycastle 1.60
> - XSSF: import chart on drawing
> - XDDF: Define XDDF user model for text body, its paragraphs and text runs
> - OPC: fixes on the newly introduced commons compress usage
>
> https://dist.apache.org/repos/dist/dev/poi/4.0.1-RC1/
>
> All tests pass. ASC files verify and SHA* are correct.
> There's no clutter in the src/bin packages.
>
> There's a new OOXMLLite mechanism, please also check poi-ooxml-schemas.
>
> Please vote to release the artifacts.
> The vote keeps open until we have have feedback from TIKA and a consent on the govdocs results.
>
> Planned release announcement date is 2018-11-26.
>
> Here is my +1
>
> Andi
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

kiwiwings
Hi Tim,

which extra ".jar"??? :)

... just kidding, I've fixed it and thank you!

I was already wondering, that the release process was too smooth.
Only my local version of the commons-openpgp needed to be used. [1]

Andi

[1] https://issues.apache.org/jira/browse/SANDBOX-508


On 20.11.18 22:33, Tim Allison wrote:

> Andi,
>    Thank you!  I've built this locally and integrated it into Tika, and
> I've kicked off the regression tests.  The one small glitch I noticed
> so far is that poi-ooxml-schemas jar has an extra ".jar" in it:
> build/dist/maven/poi-ooxml-schemas/poi-ooxml-schemas-4.0.1.jar.jar
>     I'll let you all know when I have results.
>
>                Best,
>
>                       Tim
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
>which extra ".jar"???

The first one, clearly. :D  Thank you!
On Tue, Nov 20, 2018 at 5:01 PM Andreas Beeker <[hidden email]> wrote:

>
> Hi Tim,
>
> which extra ".jar"??? :)
>
> ... just kidding, I've fixed it and thank you!
>
> I was already wondering, that the release process was too smooth.
> Only my local version of the commons-openpgp needed to be used. [1]
>
> Andi
>
> [1] https://issues.apache.org/jira/browse/SANDBOX-508
>
>
> On 20.11.18 22:33, Tim Allison wrote:
> > Andi,
> >    Thank you!  I've built this locally and integrated it into Tika, and
> > I've kicked off the regression tests.  The one small glitch I noticed
> > so far is that poi-ooxml-schemas jar has an extra ".jar" in it:
> > build/dist/maven/poi-ooxml-schemas/poi-ooxml-schemas-4.0.1.jar.jar
> >     I'll let you all know when I have results.
> >
> >                Best,
> >
> >                       Tim
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Dominik Stadler
In reply to this post by kiwiwings
Hi,

I started to compare things, -bin-..zip looks good, contents of
poi-ooxml-schemas looks sane.

However the maven-subdir is missing the directory "ooxml-schemas" now. I
think this should still be included, or? In 4.0.0 it contains
ooxml-schemas-1.4.jar/ooxml-schemas-1.4-sources.jar and checksums.

Thanks... Dominik.

On Tue, Nov 20, 2018 at 12:47 AM Andreas Beeker <[hidden email]>
wrote:

> Hi,
>
> I've prepared artifacts for the release of Apache POI 4.0.1 (RC1).
>
> The most notable changes in this release are:
>
> - dependency updates to XMLBeans 3.0.2 / Bouncycastle 1.60
> - XSSF: import chart on drawing
> - XDDF: Define XDDF user model for text body, its paragraphs and text runs
> - OPC: fixes on the newly introduced commons compress usage
>
> https://dist.apache.org/repos/dist/dev/poi/4.0.1-RC1/
>
> All tests pass. ASC files verify and SHA* are correct.
> There's no clutter in the src/bin packages.
>
> There's a new OOXMLLite mechanism, please also check poi-ooxml-schemas.
>
> Please vote to release the artifacts.
> The vote keeps open until we have have feedback from TIKA and a consent on
> the govdocs results.
>
> Planned release announcement date is 2018-11-26.
>
> Here is my +1
>
> Andi
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

pj.fanning
I found a few missing classes in poi-ooxml-schemas.jar.
We have some gaps in the XDDF testing and this leads to us not adding all
the necessary OOXML classes for XDDF to the poi-ooxml-schemas.jar.

https://github.com/apache/poi/commit/df83dab1a49900d85d9a20c0ee6d5a7a31f0eb9c



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

kiwiwings
In reply to this post by Dominik Stadler
On 21.11.18 07:09, Dominik Stadler wrote:
> However the maven-subdir is missing the directory "ooxml-schemas" now. I
> think this should still be included, or? In 4.0.0 it contains
> ooxml-schemas-1.4.jar/ooxml-schemas-1.4-sources.jar and checksums.

I've provided the ooxml-schemas 1.4 as part of the 4.0.0 release, as I had change the access to the POIXMLTypeLoader. There hasn't been any change regarding the ooxml-schemas AFAIK in 4.0.1, hence I have omit a new version.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

kiwiwings
In reply to this post by pj.fanning
On 21.11.18 10:47, pj.fanning wrote:
> I found a few missing classes in poi-ooxml-schemas.jar.

Is this now a "-1", i.e. a must-have otherwise we get a lot of stackoverflow messages complaining about it

... or a "0-", i.e. nice-to-have, but until 4.0.2 is out, the users can use the full-schema


I'm asking about this, as there were already a few changes to the trunk since I've provided the RC and we might have to do another Tika- / POI- common crawl run again... which I would like to avoid.

Andi


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

pj.fanning
I'm +1 -- the fix can wait till 4.0.2 and there is the workaround to use full
schema jar.



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
In reply to this post by kiwiwings
Reports are available here:
http://162.242.228.174/reports/reports_poi_4_0_1-rc1.tgz

We have a bunch less content in ppt, but I _think_ this is because at
the Tika level we used to duplicate notes content, and we've fixed
that bug.  So, I think this is an improvement, but I need to check.
On Wed, Nov 21, 2018 at 12:05 PM Andreas Beeker <[hidden email]> wrote:

>
> On 21.11.18 10:47, pj.fanning wrote:
> > I found a few missing classes in poi-ooxml-schemas.jar.
>
> Is this now a "-1", i.e. a must-have otherwise we get a lot of stackoverflow messages complaining about it
>
> ... or a "0-", i.e. nice-to-have, but until 4.0.2 is out, the users can use the full-schema
>
>
> I'm asking about this, as there were already a few changes to the trunk since I've provided the RC and we might have to do another Tika- / POI- common crawl run again... which I would like to avoid.
>
> Andi
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
Y, my suspicion holds up.  If you look at TOP_10_UNIQUE_TOKEN_DIFFS_A
in content_diffs_with_exceptions.xlsx, there aren't any unique words
we were extracting with 4.0.0 that we're not extracting with 4.0.1 in
the vast majority of ppt files.  Note, too, that while the number of
tokens differs, the number of unique tokens does not...for the
majority of ppt.

It looks like we have lost some content docx template files, e.g.:
http://162.242.228.174/docs/commoncrawl2/KQ/KQQ5VZ6BBBRCZPY4GDUIEMVPSGABOMM4

We used to get 17 unique words from this, and we now get just
1...we've lost: de: 2 | la: 2 | 03: 1 | 06: 1 | 1: 1 | 16: 1 | 2009: 1
| 3: 1 | conciencia: 1 | despertar: 1

These were in the header...I have to step away from the keyboard for
now...any ideas?
On Wed, Nov 21, 2018 at 12:37 PM Tim Allison <[hidden email]> wrote:

>
> Reports are available here:
> http://162.242.228.174/reports/reports_poi_4_0_1-rc1.tgz
>
> We have a bunch less content in ppt, but I _think_ this is because at
> the Tika level we used to duplicate notes content, and we've fixed
> that bug.  So, I think this is an improvement, but I need to check.
> On Wed, Nov 21, 2018 at 12:05 PM Andreas Beeker <[hidden email]> wrote:
> >
> > On 21.11.18 10:47, pj.fanning wrote:
> > > I found a few missing classes in poi-ooxml-schemas.jar.
> >
> > Is this now a "-1", i.e. a must-have otherwise we get a lot of stackoverflow messages complaining about it
> >
> > ... or a "0-", i.e. nice-to-have, but until 4.0.2 is out, the users can use the full-schema
> >
> >
> > I'm asking about this, as there were already a few changes to the trunk since I've provided the RC and we might have to do another Tika- / POI- common crawl run again... which I would like to avoid.
> >
> > Andi
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
>These were in the header...I have to step away from the keyboard for
now...any ideas?

I confirmed this by flipping btwn 4.0.0 and 4.0.1 in our dependencies
and using our Tika's SNAPSHOT for both.  This is not caused by a
different version of Tika.
On Wed, Nov 21, 2018 at 12:53 PM Tim Allison <[hidden email]> wrote:

>
> Y, my suspicion holds up.  If you look at TOP_10_UNIQUE_TOKEN_DIFFS_A
> in content_diffs_with_exceptions.xlsx, there aren't any unique words
> we were extracting with 4.0.0 that we're not extracting with 4.0.1 in
> the vast majority of ppt files.  Note, too, that while the number of
> tokens differs, the number of unique tokens does not...for the
> majority of ppt.
>
> It looks like we have lost some content docx template files, e.g.:
> http://162.242.228.174/docs/commoncrawl2/KQ/KQQ5VZ6BBBRCZPY4GDUIEMVPSGABOMM4
>
> We used to get 17 unique words from this, and we now get just
> 1...we've lost: de: 2 | la: 2 | 03: 1 | 06: 1 | 1: 1 | 16: 1 | 2009: 1
> | 3: 1 | conciencia: 1 | despertar: 1
>
> These were in the header...I have to step away from the keyboard for
> now...any ideas?
> On Wed, Nov 21, 2018 at 12:37 PM Tim Allison <[hidden email]> wrote:
> >
> > Reports are available here:
> > http://162.242.228.174/reports/reports_poi_4_0_1-rc1.tgz
> >
> > We have a bunch less content in ppt, but I _think_ this is because at
> > the Tika level we used to duplicate notes content, and we've fixed
> > that bug.  So, I think this is an improvement, but I need to check.
> > On Wed, Nov 21, 2018 at 12:05 PM Andreas Beeker <[hidden email]> wrote:
> > >
> > > On 21.11.18 10:47, pj.fanning wrote:
> > > > I found a few missing classes in poi-ooxml-schemas.jar.
> > >
> > > Is this now a "-1", i.e. a must-have otherwise we get a lot of stackoverflow messages complaining about it
> > >
> > > ... or a "0-", i.e. nice-to-have, but until 4.0.2 is out, the users can use the full-schema
> > >
> > >
> > > I'm asking about this, as there were already a few changes to the trunk since I've provided the RC and we might have to do another Tika- / POI- common crawl run again... which I would like to avoid.
> > >
> > > Andi
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
In the debugger:

XWPFHeaderFooterPolicy hfPolicy = document.getHeaderFooterPolicy();

the hfPolicy in 4.0.1 has nothing in it; whereas in 4.0.0, there's a
firstPageHeader (header3.xml), an evenPageHeader (header1.xml), a
defaultHeader (header2.xml) and a defaultFooter(footer1).

This looks like a regression.



On Wed, Nov 21, 2018 at 12:56 PM Tim Allison <[hidden email]> wrote:
>
> >These were in the header...I have to step away from the keyboard for
> now...any ideas?
>
> I confirmed this by flipping btwn 4.0.0 and 4.0.1 in our dependencies
> and using our Tika's SNAPSHOT for both.  This is not caused by a
> different version of Tika.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

pj.fanning
Tim - would you be able to provide some sample files and we can add some
regression tests?



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Dominik Stadler
In reply to this post by kiwiwings
Ok, thanks for the explanation, I am +1 from this point of view then.

Dominik.



On Wed, Nov 21, 2018, 18:00 Andreas Beeker <[hidden email]> wrote:

> On 21.11.18 07:09, Dominik Stadler wrote:
> > However the maven-subdir is missing the directory "ooxml-schemas" now. I
> > think this should still be included, or? In 4.0.0 it contains
> > ooxml-schemas-1.4.jar/ooxml-schemas-1.4-sources.jar and checksums.
>
> I've provided the ooxml-schemas 1.4 as part of the 4.0.0 release, as I had
> change the access to the POIXMLTypeLoader. There hasn't been any change
> regarding the ooxml-schemas AFAIK in 4.0.1, hence I have omit a new version.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

kiwiwings
In reply to this post by Tim Allison
Hi Tim,

On 21.11.18 19:26, Tim Allison wrote:
> This looks like a regression.


Please make your mind up, if this is a "-1".

Creating a new RC is not a big deal, I only think about the hours of computing power needed to process the common crawl.

Andi


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Tim Allison
Sorry, now that I've figured out what the problem was, I'm -1.  Y,
let's respin.
On Thu, Nov 22, 2018 at 4:34 PM Andreas Beeker <[hidden email]> wrote:

>
> Hi Tim,
>
> On 21.11.18 19:26, Tim Allison wrote:
> > This looks like a regression.
>
>
> Please make your mind up, if this is a "-1".
>
> Creating a new RC is not a big deal, I only think about the hours of computing power needed to process the common crawl.
>
> Andi
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

Dominik Stadler
Hi,

If possible we should also review
https://bz.apache.org/bugzilla/show_bug.cgi?id=62943 and ensure 4.0.1 does
it correctly as it sounds as if we may have broken compatibility with JDK 8
somehow.

Dominik.

On Fri, Nov 23, 2018 at 2:34 PM Tim Allison <[hidden email]> wrote:

> Sorry, now that I've figured out what the problem was, I'm -1.  Y,
> let's respin.
> On Thu, Nov 22, 2018 at 4:34 PM Andreas Beeker <[hidden email]>
> wrote:
> >
> > Hi Tim,
> >
> > On 21.11.18 19:26, Tim Allison wrote:
> > > This looks like a regression.
> >
> >
> > Please make your mind up, if this is a "-1".
> >
> > Creating a new RC is not a big deal, I only think about the hours of
> computing power needed to process the common crawl.
> >
> > Andi
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

pj.fanning
https://bz.apache.org/bugzilla/show_bug.cgi?id=62943 fix was to bring back a
try catch block around the code that had been removed in 4.0.0.
The 4.0.0 issue seems to affect users that have very old parsers - typically
part of Application Servers that have their own non-standard parsers.
I'm happy to have someone review the change but I think it is fixed.



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache POI 4.0.1 release (RC1)

pj.fanning
The real fix is https://bz.apache.org/bugzilla/show_bug.cgi?id=62692 -
XMLBeans-519 fixes a similar issue in XMLBeans (the SAXHelper code in
XMLBeans is basically a copy/paste of POI equivalent).



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12