[GitHub] poi pull request #54: Add Image Optimisations

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] poi pull request #54: Add Image Optimisations

BruceKuiLiu
GitHub user thelmstedt opened a pull request:

    https://github.com/apache/poi/pull/54

    Add Image Optimisations

    I need to be able to generate spreadsheets with 2000 images fast enough for a synchronous HTTP request. `3.16` takes ~25 seconds for this usecase for me. These changes take it down to ~1 second.  I've added a test for my case, and I don't get any more failures than `trunk`. I don't think I've broken any invariants but it's definitely worth a 2nd look!
   
    The slowdown was caused by the cost of creating and sorting `PackagePartNames`. I assume it's part of the OOXML spec so there's no avoiding the overhead. But `addPicture` happened to make some redundant usage of these:
    * adding a new relationship enumerated all current relationships, building `PackagePartName`s for each
    * PackageParts were stored as as a `TreeMap<PackagePartName, PackagePart>`
   
    Instead we
    * cache relationship lookups by name (similarly to what is already done for ID and type)
    * Store PackageParts in a HashMap for quick lookups, and explicitly sort its `.values()`
   
    First commit adds a benchmark using http://openjdk.java.net/projects/code-tools/jmh/ 
   
    Prior to my changes `addPicture` gets:
   
    ```
    # Run complete. Total time: 00:00:31
   
    Benchmark                                                          Mode  Cnt        Score         Error   Units
    AddImageBench.benchCreatePicture                                   avgt   10     2831.586 ±      38.824   us/op
    AddImageBench.benchCreatePicture:·gc.alloc.rate                    avgt   10      810.418 ±      22.303  MB/sec
    AddImageBench.benchCreatePicture:·gc.alloc.rate.norm               avgt   10  2407955.352 ±   33327.581    B/op
    AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space           avgt   10      847.676 ±     361.511  MB/sec
    AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space.norm      avgt   10  2520570.616 ± 1084187.937    B/op
    AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space       avgt   10        0.561 ±       0.645  MB/sec
    AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space.norm  avgt   10     1667.673 ±    1912.256    B/op
    AddImageBench.benchCreatePicture:·gc.count                         avgt   10       16.000                counts
    AddImageBench.benchCreatePicture:·gc.time                          avgt   10       69.000                    ms
    AddImageBench.benchCreatePicture:·stack                            avgt               NaN                   ---
    ```
   
    Afterwards we get 10x improvement in execution time, and 100x in memory:
   
    ```
    # Run complete. Total time: 00:00:31
   
    Benchmark                                                          Mode  Cnt      Score       Error   Units
    AddImageBench.benchCreatePicture                                   avgt   10    227.339 ±    49.226   us/op
    AddImageBench.benchCreatePicture:·gc.alloc.rate                    avgt   10    119.667 ±    25.859  MB/sec
    AddImageBench.benchCreatePicture:·gc.alloc.rate.norm               avgt   10  28021.776 ±    54.539    B/op
    AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space           avgt   10     98.653 ±   314.433  MB/sec
    AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space.norm      avgt   10  19826.075 ± 63192.153    B/op
    AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space       avgt   10      0.228 ±     1.090  MB/sec
    AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space.norm  avgt   10     45.594 ±   217.979    B/op
    AddImageBench.benchCreatePicture:·gc.count                         avgt   10      2.000              counts
    AddImageBench.benchCreatePicture:·gc.time                          avgt   10     88.000                  ms
    AddImageBench.benchCreatePicture:·stack                            avgt             NaN                 ---
    ```
   
    Happy to back out the benchmark inclusion if you don't want to include another test dependency.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thelmstedt/poi feature/redo

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/poi/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #54
   
----
commit c26b958ac32c20226db4cb41fb7dda8bc3e9a34f
Author: Tim Helmstedt <[hidden email]>
Date:   2016-10-23T20:59:16Z

    Benchmark adding images

commit 1d7cf3574016e64e0631556bb50cb466a930c18f
Author: Tim Helmstedt <[hidden email]>
Date:   2016-10-22T11:06:53Z

    PackageRelationshipCollection caches lookup by targetPart
   
    Building partnames for all relationships is expensive. Here we avoid
    this in findExistingRelation, which is used every time we add a relation
    to a DocumentPart.

commit 0fc24637893758abb9f188ea451844bc703f33d1
Author: Tim Helmstedt <[hidden email]>
Date:   2016-10-24T07:28:29Z

    Drawing test

commit d4f01a949f0f4dc9b34bdb32ef54e7a3e6f37f37
Author: Tim Helmstedt <[hidden email]>
Date:   2017-05-15T04:57:16Z

    PackagePartCollection optimisations
   
    Instead of extending a lookup TreeMap and incurring the natural ordering
    cost for each insertion, we wrap a HashMap and ensure calls to .values()
    are sorted.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] poi pull request #54: Add Image Optimisations

BruceKuiLiu
Github user asfgit closed the pull request at:

    https://github.com/apache/poi/pull/54


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]