Apache POI - Detecting difference between an xlsx file and a normal zip file

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache POI - Detecting difference between an xlsx file and a normal zip file

Thiyagarajan
Hi,
 I have an InputStream wrapped in a BufferedInputStream and I'm trying to detect whether it is a normal zip file or a xlsx file (and take appropriate actions accordingly). I have tried to use hasOOXMLHeader to achieve this. But it just checks if the input stream is a zip file and there is nothing specific for an xlsx file there. (I understand that xlsx is a zip file with a bunch of xml files).

Is it possible to detect if the inputstream is from xlsx file or a normal zip file?

Regards,
B.Thiyagarajan
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI - Detecting difference between an xlsx file and a normal zip file

Nick Burch-2
On Wed, 22 Mar 2017, Thiyagarajan wrote:

> I have an InputStream wrapped in a BufferedInputStream and I'm trying to
> detect whether it is a normal zip file or a xlsx file (and take appropriate
> actions accordingly). I have tried to use  hasOOXMLHeader
> <https://github.com/apache/poi/blob/trunk/src/java/org/apache/poi/poifs/filesystem/DocumentFactoryHelper.java#L91>
> to achieve this. But it just checks if the input stream is a zip file and
> there is nothing specific for an xlsx file there. (I understand that xlsx is
> a zip file with a bunch of xml files).
>
> Is it possible to detect if the inputstream is from xlsx file or a normal
> zip file?

Not easily from an InputStream. You'd need to check if there's a
[Content_Types].xml file in the zip to have a good idea, and that may not
be the first file in the zip. So, you'll need to buffer the zip stream
into memory, and parse through the entries to see if there's a content
types, and rewind back to process if so. Much easier with a File, as you
can easily do random-access to check without buffering

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Apache POI - Detecting difference between an xlsx file and a normal zip file

Thiyagarajan
This post was updated on .
Hi,
Say I have the actual File object of the xlsx file, what kind of random-access do you mean?
Reply | Threaded
Open this post in threaded view
|

Re: Apache POI - Detecting difference between an xlsx file and a normal zip file

Javen O'Neal
You could also use Apache Tika to detect the file type.

On Mar 22, 2017 11:15, "Thiyagarajan" <[hidden email]> wrote:

> Hi,
> Say I have the actual File object of the xlsx file, what do you mean
> exactly
> by random-access do you mean?
>
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.
> nabble.com/Apache-POI-Detecting-difference-between-
> an-xlsx-file-and-a-normal-zip-file-tp5727035p5727037.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>