[Bug 63774] New: Custom properties "add" method is slow

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 63774] New: Custom properties "add" method is slow

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63774

            Bug ID: 63774
           Summary: Custom properties "add" method is slow
           Product: POI
           Version: 4.0.0-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Dear developers,

We have recently experienced an issue moving to the latest release. In one of
our use cases we are setting a large number of custom properties to an Excel
(XLSX) file and it seems that in recent versions something was changed and this
process became much slower. Consider the following code:


public class PoiTest {
    public static void main(String[] args) throws IOException {
        try (FileInputStream fis = new FileInputStream("test.xlsx")) {
            XSSFWorkbook workbook = new XSSFWorkbook(fis);

            POIXMLProperties.CustomProperties properties =
workbook.getProperties().getCustomProperties();

            IntStream.range(0, 10000).forEach(i -> {
                long ts = System.currentTimeMillis();
                properties.addProperty("property" + i, "value" + 1);
                System.out.println(i + " time taken " +
(System.currentTimeMillis() - ts));
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

You will see that time is constantly increasing. The investigation led me to
conclude that the cause is the call to 'nextPid()' method in 'CustomProperties'
class (defined in 'POIXMLProperties'). If you notice, every time the method is
called, it iterates over the whole structure to find out which is the latest
pid. In our case, since we do the changes sequentially, we have implemented a
workaround in our code base to cache pid, in more general case I am not sure
what should be a solution (if any), but performance after some point
deteriorates so much that it makes application unusable.

Best regards,
Serhiy.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63774] Custom properties "add" method is slow

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63774

PJ Fanning <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #1 from PJ Fanning <[hidden email]> ---
Added a solution to track last Pid using r1867597

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63774] Custom properties "add" method is slow

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63774

--- Comment #2 from serhiy <[hidden email]> ---
Hello, sorry for replying to a resolved issue, but I still have some concerns
about the proposed solution. I am not really familiar with XSLSX format and
therefore I do not know if "pid" must be unique or not. In case it must be
unique, it is maybe better to use AtomicInteger or something similar, as in
concurrent code execution there is a chance of two different properties ending
up with same pid, otherwise ignore this comment. The second concern I have is
"if(contains(name))" in "add" method, according to my experimentation, this
method is also slow and leads to a bad performance after reaching around 2000
entries in Custom Properties. Therefore I am wondering if the names should be
also cached into a temporary set to improve performance. Sorry once again for
bumping the issue. Best regards, Serhiy.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]