POIXMLTextExtractor.java : IllegalStateException

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

POIXMLTextExtractor.java : IllegalStateException

Scott Gardner
Can someone explain what causes IllegalStateException to be thrown in
POIXMLTextExtractor.java?

In the file  org/apache/poi/POIXMLTextExtractor.java is this if statement

   if(size > ZipSecureFile.getMaxTextSize()) {
      throw new IllegalStateException("The text would exceed the max
allowed overall size of extracted text. "
        + "By default this is prevented as some documents may exhaust
available memory and it may indicate that the file is used to inflate
memory usage and thus could pose a security risk. "
        + "You can adjust this limit via ZipSecureFile.setMaxTextSize() if
you need to work with files which have a lot of text. "
        + "Size: " + size + ", limit: MAX_TEXT_SIZE: " +
ZipSecureFile.getMaxTextSize());
   }

Can someone tell me exactly what causes this message to be printed? What
does "The text" mean in the context of that message?
Can someone give me a .zip file that will cause this message to appear and
explain to me what it is about the contents of the .zip file
causes that message to be printed?
Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

pj.fanning
The code is there to protect against https://en.wikipedia.org/wiki/Zip_bomb



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

Scott Gardner-2
In reply to this post by Scott Gardner
I understand that, but specifically what is it in a .zip file that will cause this if statement to throw the IllegalStateException?  I don't understand where the values of text.length() and string.length() are coming from.
       int size = text.length() + string.length();
       if(size > ZipSecureFile.getMaxTextSize()) {
I'm getting this exception and I don't know what (in the .zip file) is causing this to be thrown.
The text would exceed the max allowed overall size of extracted text. By default this is prevented as some documents may exhaust available memory and it may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMaxTextSize() if you need to work with files which have a lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760

On 2019/03/22 18:39:06, Scott Gardner <[hidden email]> wrote:

> Can someone explain what causes IllegalStateException to be thrown in>
> POIXMLTextExtractor.java?>
>
> In the file  org/apache/poi/POIXMLTextExtractor.java is this if statement>
>
>    if(size > ZipSecureFile.getMaxTextSize()) {>
>       throw new IllegalStateException("The text would exceed the max>
> allowed overall size of extracted text. ">
>         + "By default this is prevented as some documents may exhaust>
> available memory and it may indicate that the file is used to inflate>
> memory usage and thus could pose a security risk. ">
>         + "You can adjust this limit via ZipSecureFile.setMaxTextSize() if>
> you need to work with files which have a lot of text. ">
>         + "Size: " + size + ", limit: MAX_TEXT_SIZE: " +>
> ZipSecureFile.getMaxTextSize());>
>    }>
>
> Can someone tell me exactly what causes this message to be printed? What>
> does "The text" mean in the context of that message?>
> Can someone give me a .zip file that will cause this message to appear and>
> explain to me what it is about the contents of the .zip file>
> causes that message to be printed?>
>
Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

Scott Gardner
In reply to this post by Scott Gardner
I understand that, but specifically what is it in a .zip file that will
cause this if statement to throw the IllegalStateException?  I don’t
understand where the values of text.length() and string.length() are coming
from.

       int size = text.length() + string.length();

       if(size > ZipSecureFile.getMaxTextSize()) {

I'm getting this exception and I don't know what (in the .zip file) is
causing this to be thrown.

The text would exceed the max allowed overall size of extracted text. By
default this is prevented as some documents may exhaust available memory
and it may indicate that the file is used to inflate memory usage and thus
could pose a security risk. You can adjust this limit via
ZipSecureFile.setMaxTextSize() if you need to work with files which have a
lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760


On 2019/03/22 19:11:17, "pj.fanning" <[hidden email]> wrote:
> The code is there to protect against
https://en.wikipedia.org/wiki/Zip_bomb>
>
>
>
> -->
> Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html>

>
> --------------------------------------------------------------------->
> To unsubscribe, e-mail: [hidden email]>
> For additional commands, e-mail: [hidden email]>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

kiwiwings
Hi Scott,

> I don’t understand where the values of text.length() and string.length() are coming from.
The values are parameters of method "checkMaxTextSize",

Where is "checkMaxTextSize" called and what are the parameters?
Please use a decent IDE, and check the references to the method!

Ok, the method is called in XSSFExcelExctractor on several locations.
e.g.
line 165: String contents = cell.getCellFormula();
line 166: checkMaxTextSize(text, contents);

So as the exception says, it checks the overall size ... so <text> must be something accumulated and the checkMaxTextSize is used, to see if that little bit more of <contents> is breaking the threshold.

What else is accumulated?
That depends on how you configured the extractor ... just check its source! [1]

> I'm getting this exception and I don't know what (in the .zip file) is
> causing this to be thrown.
So your file seems to be nearly empty?
Maybe select everything and remove the comments?

To find out what really is causing this, I would copy the extractor class and add some counters -
just use a debugger and give it a try!

Andi




[1]

if (formulasNotResults) {
    String contents = cell.getCellFormula();
    checkMaxTextSize(text, contents);

...

if(includeCellComments && comment != null) {
    // Replace any newlines with spaces, otherwise itbreaks the output String commentText = comment.getString().getString().replace('\n', ' ');
    checkMaxTextSize(text, commentText);

...

private void handleStringCell(StringBuilder text, Cell cell) {
    String contents = cell.getRichStringCellValue().getString();
    checkMaxTextSize(text, contents);

...

if (type == CellType.NUMERIC) {
    CellStyle cs = cell.getCellStyle();

    if (cs != null && cs.getDataFormatString() != null) {
        String contents = formatter.formatRawCellContents(cell.getNumericCellValue()...);
        checkMaxTextSize(text, contents);

...

// No supported styling applies to this cell String contents = ((XSSFCell)cell).getRawValue();
if (contents != null) {
    checkMaxTextSize(text, contents);


On 25.03.19 16:17, Scott Gardner wrote:

> I understand that, but specifically what is it in a .zip file that will
> cause this if statement to throw the IllegalStateException?  I don’t
> understand where the values of text.length() and string.length() are coming
> from.
>
>        int size = text.length() + string.length();
>
>        if(size > ZipSecureFile.getMaxTextSize()) {
>
> I'm getting this exception and I don't know what (in the .zip file) is
> causing this to be thrown.
>
> The text would exceed the max allowed overall size of extracted text. By
> default this is prevented as some documents may exhaust available memory
> and it may indicate that the file is used to inflate memory usage and thus
> could pose a security risk. You can adjust this limit via
> ZipSecureFile.setMaxTextSize() if you need to work with files which have a
> lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760
>
>
> On 2019/03/22 19:11:17, "pj.fanning" <[hidden email]> wrote:
>> The code is there to protect against
> https://en.wikipedia.org/wiki/Zip_bomb>

Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

Scott Gardner
Thank you.  The developer I'm working with did run our code through the
debugger and we had a better understanding of how the value of "text" was
being populated and changing over time.

Scott

On Mon, Mar 25, 2019 at 3:31 PM Andreas Beeker <[hidden email]> wrote:

> Hi Scott,
>
> > I don’t understand where the values of text.length() and string.length()
> are coming from.
> The values are parameters of method "checkMaxTextSize",
>
> Where is "checkMaxTextSize" called and what are the parameters?
> Please use a decent IDE, and check the references to the method!
>
> Ok, the method is called in XSSFExcelExctractor on several locations.
> e.g.
> line 165: String contents = cell.getCellFormula();
> line 166: checkMaxTextSize(text, contents);
>
> So as the exception says, it checks the overall size ... so <text> must be
> something accumulated and the checkMaxTextSize is used, to see if that
> little bit more of <contents> is breaking the threshold.
>
> What else is accumulated?
> That depends on how you configured the extractor ... just check its
> source! [1]
>
> > I'm getting this exception and I don't know what (in the .zip file) is
> > causing this to be thrown.
> So your file seems to be nearly empty?
> Maybe select everything and remove the comments?
>
> To find out what really is causing this, I would copy the extractor class
> and add some counters -
> just use a debugger and give it a try!
>
> Andi
>
>
>
>
> [1]
>
> if (formulasNotResults) {
>     String contents = cell.getCellFormula();
>     checkMaxTextSize(text, contents);
>
> ...
>
> if(includeCellComments && comment != null) {
>     // Replace any newlines with spaces, otherwise itbreaks the output
> String commentText = comment.getString().getString().replace('\n', ' ');
>     checkMaxTextSize(text, commentText);
>
> ...
>
> private void handleStringCell(StringBuilder text, Cell cell) {
>     String contents = cell.getRichStringCellValue().getString();
>     checkMaxTextSize(text, contents);
>
> ...
>
> if (type == CellType.NUMERIC) {
>     CellStyle cs = cell.getCellStyle();
>
>     if (cs != null && cs.getDataFormatString() != null) {
>         String contents =
> formatter.formatRawCellContents(cell.getNumericCellValue()...);
>         checkMaxTextSize(text, contents);
>
> ...
>
> // No supported styling applies to this cell String contents =
> ((XSSFCell)cell).getRawValue();
> if (contents != null) {
>     checkMaxTextSize(text, contents);
>
>
> On 25.03.19 16:17, Scott Gardner wrote:
> > I understand that, but specifically what is it in a .zip file that will
> > cause this if statement to throw the IllegalStateException?  I don’t
> > understand where the values of text.length() and string.length() are
> coming
> > from.
> >
> >        int size = text.length() + string.length();
> >
> >        if(size > ZipSecureFile.getMaxTextSize()) {
> >
> > I'm getting this exception and I don't know what (in the .zip file) is
> > causing this to be thrown.
> >
> > The text would exceed the max allowed overall size of extracted text. By
> > default this is prevented as some documents may exhaust available memory
> > and it may indicate that the file is used to inflate memory usage and
> thus
> > could pose a security risk. You can adjust this limit via
> > ZipSecureFile.setMaxTextSize() if you need to work with files which have
> a
> > lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760
> >
> >
> > On 2019/03/22 19:11:17, "pj.fanning" <[hidden email]> wrote:
> >> The code is there to protect against
> > https://en.wikipedia.org/wiki/Zip_bomb>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

Dave Fisher-5
In reply to this post by Scott Gardner-2
Hi -

Take a look at this wikipedia for one example. https://en.wikipedia.org/wiki/Billion_laughs_attack <https://en.wikipedia.org/wiki/Billion_laughs_attack>

Without knowing what your file is supposed to be and without slicing and dicing it, it is hard to know.

Where did your file come from? The wild? If so then it may be worth discussing the file itself in private because it may be a security issue.

For further research I would unzip the file in your OS and then examine the header for each of the files contained in the zip. Note how much larger the unzipped files become.

Regards,
Dave

> On Mar 25, 2019, at 7:26 AM, Scott Gardner <[hidden email]> wrote:
>
> I understand that, but specifically what is it in a .zip file that will cause this if statement to throw the IllegalStateException?  I don't understand where the values of text.length() and string.length() are coming from.
>       int size = text.length() + string.length();
>       if(size > ZipSecureFile.getMaxTextSize()) {
> I'm getting this exception and I don't know what (in the .zip file) is causing this to be thrown.
> The text would exceed the max allowed overall size of extracted text. By default this is prevented as some documents may exhaust available memory and it may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMaxTextSize() if you need to work with files which have a lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760
>
> On 2019/03/22 18:39:06, Scott Gardner <[hidden email]> wrote:
>> Can someone explain what causes IllegalStateException to be thrown in>
>> POIXMLTextExtractor.java?>
>>
>> In the file  org/apache/poi/POIXMLTextExtractor.java is this if statement>
>>
>>   if(size > ZipSecureFile.getMaxTextSize()) {>
>>      throw new IllegalStateException("The text would exceed the max>
>> allowed overall size of extracted text. ">
>>        + "By default this is prevented as some documents may exhaust>
>> available memory and it may indicate that the file is used to inflate>
>> memory usage and thus could pose a security risk. ">
>>        + "You can adjust this limit via ZipSecureFile.setMaxTextSize() if>
>> you need to work with files which have a lot of text. ">
>>        + "Size: " + size + ", limit: MAX_TEXT_SIZE: " +>
>> ZipSecureFile.getMaxTextSize());>
>>   }>
>>
>> Can someone tell me exactly what causes this message to be printed? What>
>> does "The text" mean in the context of that message?>
>> Can someone give me a .zip file that will cause this message to appear and>
>> explain to me what it is about the contents of the .zip file>
>> causes that message to be printed?>
>>

Reply | Threaded
Open this post in threaded view
|

Re: POIXMLTextExtractor.java : IllegalStateException

Dominik Stadler
In reply to this post by Scott Gardner-2
Hi,

I think there are two distinct types of security vulnerabilities that we
are talking about here. One is called something like "Zip bomb", the other
"XML bomb". Both try to get you to open a malicious file which causes a
huge expansion in memory and thus causes out-of-memory in your application
and through this a denial-of-service. One attacks at the zip-file level,
i.e. when uncompressing the ooxml-file during reading. The other on the
XML-content level which resides inside the ZIP-file.

An "XML Bomb" is a file which uses various XML-functionality to cause a
small file to expand to a much larger file in memory. Typically multiple
leves of entity-expansion are used to cause this. Apache POI protects
against this by disabling features in XML Parsers to not allow such
expansion to take place at all.

However you are actually looking at the code which protects Apache POI
against a "Zip Bomb", i.e. when a ZIP-file is created in a way which
expands to a much larger amount of memory when uncompressed. This is
probably done via writing lots of similar data which compresses very well,
however I did not look into details of this yet.

While extracting the zip file (all ooxml file types are actually compressed
zip-files), Apache POI counts compressed bytes and resulting uncompressed
bytes. If the ratio is lowe  than a given threshold (i.e. the compressed
data expands a lot), it stops processing the file with an error to avoid
this type of attack.

If the file in question is produced by yourself, it is probably safe to
lower the threshold via the API somewhat. If it is an external file from an
untrusted source, you likely don't want to process the file, only a close
look at the actual ZIP-data will allow to say for sure.

Dominik

On Mon, Mar 25, 2019, 15:26 Scott Gardner <[hidden email]> wrote:

> I understand that, but specifically what is it in a .zip file that will
> cause this if statement to throw the IllegalStateException?  I don't
> understand where the values of text.length() and string.length() are coming
> from.
>        int size = text.length() + string.length();
>        if(size > ZipSecureFile.getMaxTextSize()) {
> I'm getting this exception and I don't know what (in the .zip file) is
> causing this to be thrown.
> The text would exceed the max allowed overall size of extracted text. By
> default this is prevented as some documents may exhaust available memory
> and it may indicate that the file is used to inflate memory usage and thus
> could pose a security risk. You can adjust this limit via
> ZipSecureFile.setMaxTextSize() if you need to work with files which have a
> lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760
>
> On 2019/03/22 18:39:06, Scott Gardner <[hidden email]> wrote:
> > Can someone explain what causes IllegalStateException to be thrown in>
> > POIXMLTextExtractor.java?>
> >
> > In the file  org/apache/poi/POIXMLTextExtractor.java is this if
> statement>
> >
> >    if(size > ZipSecureFile.getMaxTextSize()) {>
> >       throw new IllegalStateException("The text would exceed the max>
> > allowed overall size of extracted text. ">
> >         + "By default this is prevented as some documents may exhaust>
> > available memory and it may indicate that the file is used to inflate>
> > memory usage and thus could pose a security risk. ">
> >         + "You can adjust this limit via ZipSecureFile.setMaxTextSize()
> if>
> > you need to work with files which have a lot of text. ">
> >         + "Size: " + size + ", limit: MAX_TEXT_SIZE: " +>
> > ZipSecureFile.getMaxTextSize());>
> >    }>
> >
> > Can someone tell me exactly what causes this message to be printed? What>
> > does "The text" mean in the context of that message?>
> > Can someone give me a .zip file that will cause this message to appear
> and>
> > explain to me what it is about the contents of the .zip file>
> > causes that message to be printed?>
> >
>