Loading Large XLSM file using POI without Memory issue .

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Loading Large XLSM file using POI without Memory issue .

rdhanara
Hi Team,

We use below 3 POI jars in in our organization for Excel processing .

·         poi-3.8

·         poi-ooxml-3.8-20120326.jar

·         poi-ooxml-schemas-3.8-20120326.jar

Below excel formats are getting processed with the help of mentioned jar files.

·         .xls

·         .xlsx

·         .xlsm

We have a new requirement where we need to load 500MB of .xlsm file and delete the particular sheet from that workbook and save the updated excel in to the server location.

Tried something like this but, while  loading opc package into xssf work book gave us out of memory error .
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

                     OPCPackage pkg = OPCPackage.open("C:\\PROJECTS\\Test.xlsm");
                     XSSFWorkbook wb = new XSSFWorkbook(pkg);
                     SXSSFWorkbook wb1 = new SXSSFWorkbook(wb, 100);


Out Of Memory Error log:
JVMDUMP032I JVM requested System dump using 'C:\PROJECTS\ Workspace\webcontent\core.20190409.141414.14460.0001.dmp' in response to an event
JVMDUMP010I System dump written to 'C:\PROJECTS\ Workspace\webcontent \core.20190409.141414.14460.0001.dmp

We have another logic running fine where currently we are parsing same file and reading some data .

                     OPCPackage pkg = OPCPackage.open(filename);
                     XSSFReader r = new XSSFReader( pkg );
                     XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) r
                                  .getSheetsData();
                     int index = 1;
                     while (iter.hasNext()) {
                           iter.next();
                           String sheetName = iter.getSheetName();
                           if(sheetName.equals("SHEET1")){
                                  break;
                           }
                           index++;
                     }
                     SharedStringsTable sst = r.getSharedStringsTable();
                     XMLReader parser = fetchSheetParser(sst); ==> org.apache.xerces.parsers.SAXParser
                     InputStream sheet2 = r.getSheet("rId"+index);
                     InputSource sheetSource = new InputSource(sheet2);

                     parser.parse(sheetSource);
                     sheet2.close();
                     pkg.close();

From the stack overflow forum understood that one of the solution is to increase the JVM to support this .
But in my case already we have enabled maximum possible size for our JVM since our application has more no of concurrent users .So further increasing it not possible in my case.

Question: Is there any possible way available to delete particular sheet without memory issue using Apache POI  ?


Note: We are using JRE 1.8

Please provide your suggestions on this .


Thanks & Regards,
Dhanaraja.R

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

pj.fanning
You can read the xslx file using the SAX approach you highlighted and you can
use a SXSSFWorkbook to write out a new xlsx file. SXSSFWorkbook is for
writing large xlsx files and it avoids keeping all the data in memory (by
storing the data in temp files).

You're probably going to have to use a newer version of POI to get all the
features (4.1.0 was just released).

There are a couple of other solutions for streaming the input but if you are
happy with the SAX approach, go with that.

https://github.com/pjfanning/poi-shared-strings-sample has 2 examples (1 for
streaming read and 1 for string write).

https://github.com/pjfanning/excel-streaming-reader-sample has an
alternative streaming read that uses an API similar to XSSFWorkbook.



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

Greg Woolsey
In reply to this post by rdhanara
First try with the latest POI, 4.1.0.  The version you are using, 3.8, is
very out of date. Much has changed, and I doubt you will get any help for
such an old version.

On Wed, Apr 10, 2019, 05:24 Dhanaraja, R (R.) <[hidden email]> wrote:

> Hi Team,
>
> We use below 3 POI jars in in our organization for Excel processing .
>
> ·         poi-3.8
>
> ·         poi-ooxml-3.8-20120326.jar
>
> ·         poi-ooxml-schemas-3.8-20120326.jar
>
> Below excel formats are getting processed with the help of mentioned jar
> files.
>
> ·         .xls
>
> ·         .xlsx
>
> ·         .xlsm
>
> We have a new requirement where we need to load 500MB of .xlsm file and
> delete the particular sheet from that workbook and save the updated excel
> in to the server location.
>
> Tried something like this but, while  loading opc package into xssf work
> book gave us out of memory error .
> import org.apache.poi.openxml4j.opc.OPCPackage;
> import org.apache.poi.xssf.streaming.SXSSFWorkbook;
> import org.apache.poi.xssf.usermodel.XSSFSheet;
> import org.apache.poi.xssf.usermodel.XSSFWorkbook;
>
>                      OPCPackage pkg =
> OPCPackage.open("C:\\PROJECTS\\Test.xlsm");
>                      XSSFWorkbook wb = new XSSFWorkbook(pkg);
>                      SXSSFWorkbook wb1 = new SXSSFWorkbook(wb, 100);
>
>
> Out Of Memory Error log:
> JVMDUMP032I JVM requested System dump using 'C:\PROJECTS\
> Workspace\webcontent\core.20190409.141414.14460.0001.dmp' in response to an
> event
> JVMDUMP010I System dump written to 'C:\PROJECTS\ Workspace\webcontent
> \core.20190409.141414.14460.0001.dmp
>
> We have another logic running fine where currently we are parsing same
> file and reading some data .
>
>                      OPCPackage pkg = OPCPackage.open(filename);
>                      XSSFReader r = new XSSFReader( pkg );
>                      XSSFReader.SheetIterator iter =
> (XSSFReader.SheetIterator) r
>                                   .getSheetsData();
>                      int index = 1;
>                      while (iter.hasNext()) {
>                            iter.next();
>                            String sheetName = iter.getSheetName();
>                            if(sheetName.equals("SHEET1")){
>                                   break;
>                            }
>                            index++;
>                      }
>                      SharedStringsTable sst = r.getSharedStringsTable();
>                      XMLReader parser = fetchSheetParser(sst); ==>
> org.apache.xerces.parsers.SAXParser
>                      InputStream sheet2 = r.getSheet("rId"+index);
>                      InputSource sheetSource = new InputSource(sheet2);
>
>                      parser.parse(sheetSource);
>                      sheet2.close();
>                      pkg.close();
>
> From the stack overflow forum understood that one of the solution is to
> increase the JVM to support this .
> But in my case already we have enabled maximum possible size for our JVM
> since our application has more no of concurrent users .So further
> increasing it not possible in my case.
>
> Question: Is there any possible way available to delete particular sheet
> without memory issue using Apache POI  ?
>
>
> Note: We are using JRE 1.8
>
> Please provide your suggestions on this .
>
>
> Thanks & Regards,
> Dhanaraja.R
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

r.dhanaraja@gmail.com
In reply to this post by pj.fanning
Hi Team,
  Thanks for your reply.

1.SAX Approach:
We have tried this but our XLSM file has graph included in it
<http://apache-poi.1045710.n5.nabble.com/file/t340679/Graph.png>
Also it has many tables structure included in it . When we use SAX parser we
were able to read the cell value but bringing up the original styles we
failed here.

2.poi-shared-strings-sample
url: https://github.com/pjfanning/poi-shared-strings-sample 

Tried this approach . Here also facing the same problem with the graph and
styles.

Also we are facing alignment issues. On the original file we have set of
alignment with specific cell gap. but using parser we were unable to bring
it back into the new file. looks like parser just getting the cell details
whichever having values in it .


Is there any possible way instead reading original file and writing it
freshly into the new file?
because my requirement is just to remove the sheet from the original file .

3.upgraded below jars
 -commons-collections4-4.1.jar
 - poi-4.0.0.jar
 - poi-ooxml-4.0.0.jar
 - poi-ooxml-schemas-4.0.0.jar
 - xmlbeans-3.0.2.jar

Note: Since my organisation right now allowing only till 4.0.x have not
moved to the latest on 4.1.x (Hope there wont be much difference between
these to versions)

Please provide your suggestion regarding this .

Thanks & Regards,
Dhana



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

rdhanara
This post was updated on .
In reply to this post by pj.fanning
Hi Team,
  Thanks for your reply.

1.SAX Approach:
We have tried this but our XLSM file has graph included in it
<http://apache-poi.1045710.n5.nabble.com/file/t340680/Graph.png

Also it has many tables structure included in it . When we use SAX parser we
were able to read the cell value but bringing up the original styles we
failed here.

2.poi-shared-strings-sample
url: https://github.com/pjfanning/poi-shared-strings-sample 

Tried this approach . Here also facing the same problem with the graph and
styles.

Also we are facing alignment issues. On the original file we have set of
alignment with specific cell gap. but using parser we were unable to bring
it back into the new file. looks like parser just getting the cell details
whichever having values in it .



3.upgraded below jars
 -commons-collections4-4.1.jar
 - poi-4.0.0.jar
 - poi-ooxml-4.0.0.jar
 - poi-ooxml-schemas-4.0.0.jar
 - xmlbeans-3.0.2.jar

Note: Since my organization right now allowing only till 4.0.x have not
moved to the latest on 4.1.x (Hope there wont be much difference between
these to versions)


Is there any possible way, instead of reading original file and writing it
freshly into the new file?
because my requirement is just to remove the sheet from the original file .


Please provide your suggestion regarding this .

Thanks & Regards,
Dhana



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

Greg Woolsey
Is the sheet to delete referenced in any formulas, names, chart series,
etc. On other sheets?

Is it XLSX? If so, you may be able to just open it as a zip file and delete
the sheet by manipulating the zip contents directly, without bothering to
parse the xml.

I would take a sample, open it in Excel, delete the sheet there, save a
copy, then compare the zip.contents with the original.

That will show you how much has to change, and whether it can be done
without a full semantic parse.

On Thu, Apr 11, 2019, 22:04 rdhanara <[hidden email]> wrote:

> Hi Team,
>   Thanks for your reply.
>
> 1.SAX Approach:
> We have tried this but our XLSM file has graph included in it
> <http://apache-poi.1045710.n5.nabble.com/file/t340680/Graph.png>
>
> Also it has many tables structure included in it . When we use SAX parser
> we
> were able to read the cell value but bringing up the original styles we
> failed here.
>
> 2.poi-shared-strings-sample
> url: https://github.com/pjfanning/poi-shared-strings-sample
>
> Tried this approach . Here also facing the same problem with the graph and
> styles.
>
> Also we are facing alignment issues. On the original file we have set of
> alignment with specific cell gap. but using parser we were unable to bring
> it back into the new file. looks like parser just getting the cell details
> whichever having values in it .
>
>
> Is there any possible way instead reading original file and writing it
> freshly into the new file?
> because my requirement is just to remove the sheet from the original file .
>
> 3.upgraded below jars
>  -commons-collections4-4.1.jar
>  - poi-4.0.0.jar
>  - poi-ooxml-4.0.0.jar
>  - poi-ooxml-schemas-4.0.0.jar
>  - xmlbeans-3.0.2.jar
>
> Note: Since my organization right now allowing only till 4.0.x have not
> moved to the latest on 4.1.x (Hope there wont be much difference between
> these to versions)
>
> Please provide your suggestion regarding this .
>
> Thanks & Regards,
> Dhana
>
>
>
> --
> Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

rdhanara
Hi,

Please find my answers

1.Is the sheet to delete referenced in any formulas, names, chart series,
etc. On other sheets?
 Ans: *No. There is no dependency with any other sheet *

2.Is it XLSX?
 Ans : It is *.XLSM format* ( Mine has set of macros available in it )

3.Hope your suggestion on opening the sheet as zip and delete the same would
work for .xlsm formats as well.
 If so please provide some sample or reference document so that I can give a
try on it .

Thanks & Regards,
Dhana




--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

pj.fanning
Xlsx and Xlsm files are zip files. You could try using Java ZipFile or
ZipInputStream to read the file.
Most of the files in the zip are XML files.
You could filter out the file that has the sheet data for the sheet that you
want to delete.

It's quite likely that some of the other XML files will have references to
the sheet you are trying to remove. You may get away with leaving some of
these references behind but some might cause issues.

Greg has suggested that you take a sample xlsm file and modify it in Excel
to remove the sheet and save that as a new file. You can unzip the original
and new files and diff them to see what you you will need to automate.



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

rdhanara
Hi,

As suggested trying to do below steps,

1. Unzip the large xlsm file
2.Delete the Sheet as per requirement
3.repack the source once again to xlsm file

I have done step 1 & 3 as follow. Please have a look and correct me if i am
going in wrong direction ..
***************
*1.Tried unzipping the large xlsm file as follows*

String zipFilePath = "C:\\PROJECTS\\Source.xlsm";
               
String destDir = "C:\\PROJECTS\\unzip";


 private static void unzip(String zipFilePath, String destDir) {
                File dir = new File(destDir);
                // create output directory if it doesn't exist
                if(!dir.exists())
                dir.mkdirs();
                FileInputStream fis;
                //buffer for read and write data to file
                byte[] buffer = new byte[1024];
                try {
                    fis = new FileInputStream(zipFilePath);
                    ZipInputStream zis = new ZipInputStream(fis);
                    ZipEntry ze = zis.getNextEntry();
                    while(ze != null){
                        String fileName = ze.getName();
                        File newFile = new File(destDir + File.separator +
fileName);
                        System.out.println("Unzipping to
"+newFile.getAbsolutePath());
                        //create directories for sub directories in zip
                        new File(newFile.getParent()).mkdirs();
                        FileOutputStream fos = new FileOutputStream(newFile);
                        int len;
                        while ((len = zis.read(buffer)) > 0) {
                        fos.write(buffer, 0, len);
                        }
                        fos.close();
                        //close this ZipEntry
                        zis.closeEntry();
                        ze = zis.getNextEntry();
                    }
                    //close last ZipEntry
                    zis.closeEntry();
                    zis.close();
                    fis.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
               
            }

Once after executing above program it extract xlsm content as follows
<http://apache-poi.1045710.n5.nabble.com/file/t340680/unzip.png>

*********************

*2. Repacking *

ZipOutputStream out = new ZipOutputStream(new
FileOutputStream("C:\\PROJECTS\\TEST.xlsm"));

              Files.walkFileTree(Paths.get("C:\\PROJECTS\\unzip"), new
SimpleFileVisitor<Path>() {
                  public FileVisitResult visitFile(Path file,
BasicFileAttributes attrs) throws IOException {
                 out.putNextEntry(new
ZipEntry(Paths.get("C:\\PROJECTS\\unzip").relativize(file).toString()));
                      Files.copy(file, out);
                      out.closeEntry();
                      return FileVisitResult.CONTINUE;
                  }
              });

If I open TEST.xlsm  I am able to see the original xlsm file content here..

But as per the suggestion before repacking i need to delete the expected
sheet and then repack it.
 I am trying to parse this file and delete the sheet tag..

*\xl\workbook.xml*

Will update you shortly.


*Note: If you have any better approach than the one i mentioned here do let
me know ..*




--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading Large XLSM file using POI without Memory issue .

rdhanara
Finally found a way to parse xml and delete the reference of the sheet which
i planned to delete from my excel workbook

*To delete the unwanted sheet data from workbook :*

          public static void delete_Merged_report_sheet(String destDir) {
                  String merged_rprt_sheet_Path = destDir+"\\xl\\worksheets\\sheet3.xml";
                  File merged_sheet = new File(merged_rprt_sheet_Path);
                  merged_sheet.delete();
          }
*******************************
*To remove the deleted sheet references from the worksheet :*
          public static void modifyXml(String destDir)  {
                  try {
                  String filepath = destDir+"\\xl\\workbook.XML";
                        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
                        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
                        Document doc = docBuilder.parse(filepath);
                       
                        Node sheets = doc.getElementsByTagName("sheets").item(0);
                        NodeList list = sheets.getChildNodes();
                        for (int i = 0; i < list.getLength(); i++) {
                            Node node = list.item(i);
                            NamedNodeMap attributes =   node.getAttributes();
                            attributes.getNamedItem("r:id").getTextContent();
                            if ("rId3".equals(attributes.getNamedItem("r:id").getTextContent()))
{
                            sheets.removeChild(node);
                                   }
                        }
                        // write the content into xml file
                        TransformerFactory transformerFactory = TransformerFactory.newInstance();
                        Transformer transformer = transformerFactory.newTransformer();
                        DOMSource source = new DOMSource(doc);
                        StreamResult result = new StreamResult(new File(filepath));
                        transformer.transform(source, result);
                  } catch (ParserConfigurationException pce) {
                                pce.printStackTrace();
                           } catch (TransformerException tfe) {
                                tfe.printStackTrace();
                           } catch (IOException ioe) {
                                ioe.printStackTrace();
                           } catch (SAXException sae) {
                                sae.printStackTrace();
                           }
                }
         

*Thanks everyone for helping me in identifying the solution without memory
issue .. *




--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]