Problems using IOUtils.setByteArrayMaxOverride()

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems using IOUtils.setByteArrayMaxOverride()

Shifflett, David [USA]

I am using Tika to do content extraction on Visio (vsd) files,

and I am running into an ‘Unexpected RuntimeException’.

The stack trace for this is in the attached stack-trace-withOUT-setByteArrayMaxOverride.txt file.

 

When I tried the suggested work around of calling IOUtils.setByteArrayMaxOverride()
on the same file, I got the ‘Unexpected RuntimeException’ from a different part of the code.

It appears to me that when IOUtils.setByteArrayMaxOverride() is called with anything less than
Integer.MAX_VALUE, that calls to toByteArray() will fail in checkLength()

because the length input will be greater than BYTE_ARRAY_MAX_OVERRIDE.

 

Here is a snippet of the code I am using:

    private void extract(InputStream is, Path outputDir, ContentHandler h, Metadata m , AutoDetectParser extractParser) throws SAXException, TikaException, IOException {

        Map retVal = new HashMap();

        ParseContext c = new ParseContext();

 

        c.set(Parser.class, extractParser);

        EmbeddedDocumentExtractor ex = new MY_EmbeddedDocumentExtractor(outputDir, c);

        c.set(EmbeddedDocumentExtractor.class, ex);

 

        // Override the POI maximum length for all record types

        // IOUtils.setByteArrayMaxOverride(100 * 1024 * 1024);

        // IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);

        extractParser.parse(is, h, m, c);

 

        // Reset/disable the override

        // IOUtils.setByteArrayMaxOverride(-1);

    }

 

As you can see from the commented out IOUtils.setByteArrayMaxOverride() calls,

I tried this with both 100 MB, and 30 MB.

A second stack trace for the secondary error (with IOUtils.setByteArrayMaxOverride() being called)
is attached in stack-trace-with-setByteArrayMaxOverride.txt.

 

In each stack trace I have snipped out the calls to my code.

I tried attaching the two VSD files I used for testing, but they are 3+ MB each,
which caused the submission to fail.
I can email these two files directly to anyone interested in testing with them.

 

Thanks for any help you can give to resolve this,

David Shifflett

 



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

stack-trace-withOUT-setByteArrayMaxOverride.txt (6K) Download Attachment
stack-trace-with-setByteArrayMaxOverride.txt (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problems using IOUtils.setByteArrayMaxOverride()

Dominik Stadler
Hi,

thanks for reporting, it seems the following reproduces this nicely:

        IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);
        try {
            ByteArrayInputStream stream = new
ByteArrayInputStream("abc".getBytes(StandardCharsets.UTF_8));
            IOUtils.toByteArray(stream);
        } finally {
            IOUtils.setByteArrayMaxOverride(-1);
        }

We have bug https://bz.apache.org/bugzilla/show_bug.cgi?id=63569 reported
for this.

Thanks... Dominik.


On Thu, Dec 12, 2019 at 8:48 PM Shifflett, David [USA] <
[hidden email]> wrote:

> I am using Tika to do content extraction on Visio (vsd) files,
>
> and I am running into an ‘Unexpected RuntimeException’.
>
> The stack trace for this is in the attached
> stack-trace-withOUT-setByteArrayMaxOverride.txt file.
>
>
>
> When I tried the suggested work around of calling
> IOUtils.setByteArrayMaxOverride()
> on the same file, I got the ‘Unexpected RuntimeException’ from a different
> part of the code.
>
> It appears to me that when IOUtils.setByteArrayMaxOverride() is called
> with anything less than
> Integer.MAX_VALUE, that calls to toByteArray() will fail in checkLength()
>
> because the length input will be greater than BYTE_ARRAY_MAX_OVERRIDE.
>
>
>
> Here is a snippet of the code I am using:
>
>     private void extract(InputStream is, Path outputDir, ContentHandler h,
> Metadata m , AutoDetectParser extractParser) throws SAXException,
> TikaException, IOException {
>
>         Map retVal = new HashMap();
>
>         ParseContext c = new ParseContext();
>
>
>
>         c.set(Parser.class, extractParser);
>
>         EmbeddedDocumentExtractor ex = new
> MY_EmbeddedDocumentExtractor(outputDir, c);
>
>         c.set(EmbeddedDocumentExtractor.class, ex);
>
>
>
>         // Override the POI maximum length for all record types
>
>         // IOUtils.setByteArrayMaxOverride(100 * 1024 * 1024);
>
>         // IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);
>
>         extractParser.parse(is, h, m, c);
>
>
>
>         // Reset/disable the override
>
>         // IOUtils.setByteArrayMaxOverride(-1);
>
>     }
>
>
>
> As you can see from the commented out IOUtils.setByteArrayMaxOverride()
> calls,
>
> I tried this with both 100 MB, and 30 MB.
>
> A second stack trace for the secondary error (with
> IOUtils.setByteArrayMaxOverride() being called)
> is attached in stack-trace-with-setByteArrayMaxOverride.txt.
>
>
>
> In each stack trace I have snipped out the calls to my code.
>
> I tried attaching the two VSD files I used for testing, but they are 3+ MB
> each,
> which caused the submission to fail.
> I can email these two files directly to anyone interested in testing with
> them.
>
>
>
> Thanks for any help you can give to resolve this,
>
> David Shifflett
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]