POI 4.0.0 issues with new commons-compress library "InputStream of class [..] is not implementing InputStreamStatistics"

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

POI 4.0.0 issues with new commons-compress library "InputStream of class [..] is not implementing InputStreamStatistics"

Jörn Franke
Dear all,

as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the functionality
to read office documents, such as MS Excel, on Big Data platforms, such as
Hadoop/Hive/Spark/Flink.

I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).

Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible. I need now to find
a workaround for this. Alternative classpath settings are not working very
well and create another mess.

Do you have any idea on how I can deal with this check?  Can I inject
somehow InputStreamStatistics in my InputStream? Or can I somehow inject my
own ZipArchiveInputStream?
Alternatively, could Apache POI instead of using ZipArchiveInputStream
create another class POIZipArchiveInputStream and let this custom class
extend ArchiveInputStream and implement InputStreamStatistics? This would
remove all my classpath issues with the Big Data platforms ....


Thank you.

Best regards
Reply | Threaded
Open this post in threaded view
|

Re: POI 4.0.0 issues with new commons-compress library "InputStream of class [..] is not implementing InputStreamStatistics"

Jörn Franke
Hi Nick,

thank you for the quick response. It is already on the POI web page. I
fully agree with you that we should always use the latest version with
security fixes (I already started to file bugs with some of the platforms).
With dependency shading this is possible in my case. The developers/users
will need to shade the dependencies for their application, but I provide
examples, so it is not such a big issue to change.

best regards

On Sun, Sep 30, 2018 at 12:43 AM Nick Burch <[hidden email]> wrote:

> On Sat, 29 Sep 2018, Jörn Franke wrote:
> > as part of the HadoopOffice library (
> > https://github.com/zuinnote/hadoopoffice/wiki) we provide the
> > functionality to read office documents, such as MS Excel, on Big Data
> > platforms, such as Hadoop/Hive/Spark/Flink.
>
> We should probably list that on the website! Do you have a few paragraph
> blurb we can use?
>
> > I want to release a new version supporting POI 4.0.0, but I have one
> > remaining blocking issue: The Big Data platforms use an old version of
> > commons-compress (between 1.4.x and 1.9.x). This means I am always
> running
> > into the exception in ZipArchiveThresholdInputStream "InputStream of
> class
> > [..] is not implementing InputStreamStatistics" (
> >
> https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
> > ).
>
> We need that for security reasons - newer Java versions won't let us
> protect against zip bomb attacks as they inconveniently hide the expansion
> stats, so we had to switch to commons to guard against it.
>
> > Unfortunately, updating these platforms to the latest commons-compress is
> > very intrusive and for many organizations not possible.
>
> Wave some CVEs at them and see if you can tempt an upgrade?
>
> If not, you'd probably need to work with the commons folks to backport the
> zip stats stuff to your old version, so you can keep the security stuff we
> need? dev@commons is moderately quiet and fairly friendly :)
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: POI 4.0.0 issues with new commons-compress library "InputStream of class [..] is not implementing InputStreamStatistics"

pj.fanning
In reply to this post by Jörn Franke
I just logged https://issues.apache.org/jira/browse/HADOOP-15804 but it can
take a long time for upgrades in hadoop dependencies due to the large number
of projects and the complex relationships between them.



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]