How to check for valid excel files using POI without checking the file extension

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to check for valid excel files using POI without checking the file extension

James Geroge
Hi Friends,
Is there a way to know the file is an excel file without manipulating the file extension, as the users can send the excel files in format like below.
Test
Test.xls
Test.xlsx
Test.xlsxxlsx(by renaming the file using windows explorer)
Test.xlsabcd (by renaming

Thanks,
James George.
Reply | Threaded
Open this post in threaded view
|

Re: How to check for valid excel files using POI without checking the file extension

Antoni Mylka-2
W dniu 2010-04-19 13:49, James Geroge pisze:
>
> Hi Friends,
> Is there a way to know the file is an excel file without manipulating the
> file extension, as the users can send the excel files in format like below.
> Test
> Test.xls
> Test.xlsx
> Test.xlsxxlsx(by renaming the file using windows explorer)
> Test.xlsabcd (by renaming

You can use a mime type identifier. Nice ones are in the Aperture
Framework (disclaimer: I'm Aperture's maintainer :) ) or in Apache Tika.

AFAIK Apache Tika's is better at recognizing xlsx files if the extension
is missing, at least at the moment. We're working on it too.

Antoni Mylka
[hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to check for valid excel files using POI without checking the file extension

Paul Spencer-3
In reply to this post by James Geroge
James,
You can open the file using SS Usermode. See http://poi.markmail.org/message/ejihftiztkrifcvq?q=from:%22Paul+Spencer%22

Paul Spencer


On Apr 19, 2010, at 7:49 AM, James Geroge wrote:

>
> Hi Friends,
> Is there a way to know the file is an excel file without manipulating the
> file extension, as the users can send the excel files in format like below.
> Test
> Test.xls
> Test.xlsx
> Test.xlsxxlsx(by renaming the file using windows explorer)
> Test.xlsabcd (by renaming
>
> Thanks,
> James George.
> --
> View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287650.html
> Sent from the POI - Dev mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to check for valid excel files using POI without checking the file extension

Mark Beardsley
In reply to this post by James Geroge
Hello James,

The most obvious answer is the WorkbookFactory class - http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html - if you have a valid Excel workbook then it will return an instance of either the XSSF or HSSFWorkbook class. That does impose some overhead of course as the Excel file will effectively be opened which could take a few moments and tie up some memory.

The other option would be to look at the file header, the first few bytes of the file. There is a website - filext.com - that includes provides this sort of information. For example, here is the information for the .xls file format http://filext.com/file-extension/XLS and this for the .xlsx http://filext.com/file-extension/xlsx. In essence, you would open a stream onto the file, recover the first few bytes and see if they match either pattern; but I do not know whether this is an entirely fail safe option.

Yours

Mark B

PS. You have posted this onto the the dev list when if really ought to be posted onto the user list. The dev list is where you would post if you were experiencing problems with the API - for example a particular file provoking exceptions - or if you wanted to ask for an enhancement. Furthermore, fewer people view the dev list and you are reducing your chances of receiving a response to your question.


James Geroge wrote
Hi Friends,
Is there a way to know the file is an excel file without manipulating the file extension, as the users can send the excel files in format like below.
Test
Test.xls
Test.xlsx
Test.xlsxxlsx(by renaming the file using windows explorer)
Test.xlsabcd (by renaming

Thanks,
James George.
Reply | Threaded
Open this post in threaded view
|

Re: How to check for valid excel files using POI without checking the file extension

James Geroge
Hello Mark,
Thanks for your suggestions.
I tried to raise a null pointer exception with WorkbookFactory Class, but did not work so done a try catch and able to get a handler to the requirement i had.

and Thanks for the other suggestions too.

The code below...
try
            {
                if (WorkbookFactory.create(input)!=null)
                    {
                        log("GOOD FILE");
                    }
                    else
                    {
                        log("Invalid input file Or Not a valid Excel file");
                    }
            }
            catch (Exception e1) {
                //e1.printStackTrace();
                log("Invalid input file Or Not a valid Excel file");
                return; // no need to process if it is not an excel
                }

Thanks,
James George

MSB wrote
Hello James,

The most obvious answer is the WorkbookFactory class - http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html - if you have a valid Excel workbook then it will return an instance of either the XSSF or HSSFWorkbook class. That does impose some overhead of course as the Excel file will effectively be opened which could take a few moments and tie up some memory.

The other option would be to look at the file header, the first few bytes of the file. There is a website - filext.com - that includes provides this sort of information. For example, here is the information for the .xls file format http://filext.com/file-extension/XLS and this for the .xlsx http://filext.com/file-extension/xlsx. In essence, you would open a stream onto the file, recover the first few bytes and see if they match either pattern; but I do not know whether this is an entirely fail safe option.

Yours

Mark B

PS. You have posted this onto the the dev list when if really ought to be posted onto the user list. The dev list is where you would post if you were experiencing problems with the API - for example a particular file provoking exceptions - or if you wanted to ask for an enhancement. Furthermore, fewer people view the dev list and you are reducing your chances of receiving a response to your question.


James Geroge wrote
Hi Friends,
Is there a way to know the file is an excel file without manipulating the file extension, as the users can send the excel files in format like below.
Test
Test.xls
Test.xlsx
Test.xlsxxlsx(by renaming the file using windows explorer)
Test.xlsabcd (by renaming

Thanks,
James George.
Reply | Threaded
Open this post in threaded view
|

Re: How to check for valid excel files using POI without checking the file extension

Mark Beardsley

You're welcome James. All the best with your project and, if you need any
further help, just drop an message onto the list.

Yours

Mark B


James Geroge wrote:

>
> Hello Mark,
> Thanks for your suggestions.
> I tried to raise a null pointer exception with WorkbookFactory Class, but
> did not work so done a try catch and able to get a handler to the
> requirement i had.
>
> and Thanks for the other suggestions too.
>
> The code below...
> try
>    {
> if (WorkbookFactory.create(input)!=null)
>    {
> log("GOOD FILE");
>    }
>    else
>    {
> log("Invalid input file Or Not a valid Excel file");
>    }
>    }
>    catch (Exception e1) {
> //e1.printStackTrace();
> log("Invalid input file Or Not a valid Excel file");
> return; // no need to process if it is not an excel
> }
>
> Thanks,
> James George
>
>
> MSB wrote:
>>
>> Hello James,
>>
>> The most obvious answer is the WorkbookFactory class -
>> http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html
>> - if you have a valid Excel workbook then it will return an instance of
>> either the XSSF or HSSFWorkbook class. That does impose some overhead of
>> course as the Excel file will effectively be opened which could take a
>> few moments and tie up some memory.
>>
>> The other option would be to look at the file header, the first few bytes
>> of the file. There is a website - filext.com - that includes provides
>> this sort of information. For example, here is the information for the
>> .xls file format http://filext.com/file-extension/XLS and this for the
>> .xlsx http://filext.com/file-extension/xlsx. In essence, you would open a
>> stream onto the file, recover the first few bytes and see if they match
>> either pattern; but I do not know whether this is an entirely fail safe
>> option.
>>
>> Yours
>>
>> Mark B
>>
>> PS. You have posted this onto the the dev list when if really ought to be
>> posted onto the user list. The dev list is where you would post if you
>> were experiencing problems with the API - for example a particular file
>> provoking exceptions - or if you wanted to ask for an enhancement.
>> Furthermore, fewer people view the dev list and you are reducing your
>> chances of receiving a response to your question.
>>
>>
>>
>> James Geroge wrote:
>>>
>>> Hi Friends,
>>> Is there a way to know the file is an excel file without manipulating
>>> the file extension, as the users can send the excel files in format like
>>> below.
>>> Test
>>> Test.xls
>>> Test.xlsx
>>> Test.xlsxxlsx(by renaming the file using windows explorer)
>>> Test.xlsabcd (by renaming
>>>
>>> Thanks,
>>> James George.
>>>
>>
>>
>
>

--
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287704.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to check for valid excel files using POI without checking the file extension

James Geroge
In reply to this post by James Geroge

Hello Mark,
I appreciate you help.

Regards,
JG

MSB wrote:

>
> You're welcome James. All the best with your project and, if you need any
> further help, just drop an message onto the list.
>
> Yours
>
> Mark B
>
>
> James Geroge wrote:
>>
>> Hello Mark,
>> Thanks for your suggestions.
>> I tried to raise a null pointer exception with WorkbookFactory Class, but
>> did not work so done a try catch and able to get a handler to the
>> requirement i had.
>>
>> and Thanks for the other suggestions too.
>>
>> The code below...
>> try
>>    {
>> if (WorkbookFactory.create(input)!=null)
>>    {
>> log("GOOD FILE");
>>    }
>>    else
>>    {
>> log("Invalid input file Or Not a valid Excel file");
>>    }
>>    }
>>    catch (Exception e1) {
>> //e1.printStackTrace();
>> log("Invalid input file Or Not a valid Excel file");
>> return; // no need to process if it is not an excel
>> }
>>
>> Thanks,
>> James George
>>
>>
>> MSB wrote:
>>>
>>> Hello James,
>>>
>>> The most obvious answer is the WorkbookFactory class -
>>> http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html
>>> - if you have a valid Excel workbook then it will return an instance of
>>> either the XSSF or HSSFWorkbook class. That does impose some overhead of
>>> course as the Excel file will effectively be opened which could take a
>>> few moments and tie up some memory.
>>>
>>> The other option would be to look at the file header, the first few
>>> bytes of the file. There is a website - filext.com - that includes
>>> provides this sort of information. For example, here is the information
>>> for the .xls file format http://filext.com/file-extension/XLS and this
>>> for the .xlsx http://filext.com/file-extension/xlsx. In essence, you
>>> would open a stream onto the file, recover the first few bytes and see
>>> if they match either pattern; but I do not know whether this is an
>>> entirely fail safe option.
>>>
>>> Yours
>>>
>>> Mark B
>>>
>>> PS. You have posted this onto the the dev list when if really ought to
>>> be posted onto the user list. The dev list is where you would post if
>>> you were experiencing problems with the API - for example a particular
>>> file provoking exceptions - or if you wanted to ask for an enhancement.
>>> Furthermore, fewer people view the dev list and you are reducing your
>>> chances of receiving a response to your question.
>>>
>>>
>>>
>>> James Geroge wrote:
>>>>
>>>> Hi Friends,
>>>> Is there a way to know the file is an excel file without manipulating
>>>> the file extension, as the users can send the excel files in format
>>>> like below.
>>>> Test
>>>> Test.xls
>>>> Test.xlsx
>>>> Test.xlsxxlsx(by renaming the file using windows explorer)
>>>> Test.xlsabcd (by renaming
>>>>
>>>> Thanks,
>>>> James George.
>>>>
>>>
>>>
>>
>>
>
>

--
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287705.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]