[Bug 57840] New: [PATCH] Support for structured references with Excel tables.

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[Bug 57840] New: [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

            Bug ID: 57840
           Summary: [PATCH]  Support for structured references with Excel
                    tables.
           Product: POI
           Version: 3.11-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: XSSF
          Assignee: [hidden email]
          Reporter: [hidden email]

Created attachment 32671
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32671&action=edit
Structured reference support for POI

Added new support for the Structured Reference syntax that was introduced in MS
Excel 2007.
You can find more information about structured references in :
https://support.office.com/en-us/article/Using-structured-references-with-Excel-tables-F5ED2452-2337-4F71-BED3-C8AE6D2B276E
The new patch adds the ability of parsing structured references (Only works in
XSSF as in HSSF you don't have tables) and converting it to normal (and well
implemented) Area Reference.
Also added support for Indirect evaluations with structured references.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|3.11-FINAL                  |3.12-dev

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |PatchAvailable

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

Vlad Skarzhevskyy <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #1 from GW <[hidden email]> ---
This is great, just what I needed for

https://bz.apache.org/bugzilla/show_bug.cgi?id=57721

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

GW <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |57721

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #2 from Javen O'Neal <[hidden email]> ---
attachment 32671 has merge conflicts against the trunk. Additional work will be
needed to bring this back to something that can be committed to the trunk.

A few more unit tests are probably also needed for the new functionality that
were added to make sure we don't break structured references in the future.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #3 from GW <[hidden email]> ---
I have this working against trunk, all tests pass, but no new tests yet.  I'll
work on a patch today and tomorrow with tests and some tweaks to the original
patch. For one, it had company network specific changes to build.xml that
leaked into the patch (I've removed them). There is some code that could
benefit from turning a series of constants into an Enum, and it performs very
slowly for large workbooks/large tables.  I can see a couple of things that
would be quick, small changes for some big gains.  Again, with tests.

First, though, I have to test against 3.12, as a library I use (Vaadin
Spreadsheet) doesn't like something that changed in the API between 3.12 and
trunk, as that's where my POC has to work for my day job :)

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #4 from Nick Burch <[hidden email]> ---
(In reply to GW from comment #3)
> First, though, I have to test against 3.12, as a library I use (Vaadin
> Spreadsheet) doesn't like something that changed in the API between 3.12 and
> trunk, as that's where my POC has to work for my day job :)

We've tried reaching out to the Vaadin Spreadsheet folks a few times now, but
never had any response. If you have contacts there, please ping them to get in
touch! (IIRC it was conditional formatting stuff we were most recently
interested in collaborating on)

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #5 from GW <[hidden email]> ---
(In reply to Nick Burch from comment #4)
> (In reply to GW from comment #3)
> > First, though, I have to test against 3.12, as a library I use (Vaadin
> > Spreadsheet) doesn't like something that changed in the API between 3.12 and
> > trunk, as that's where my POC has to work for my day job :)
>
> We've tried reaching out to the Vaadin Spreadsheet folks a few times now,
> but never had any response. If you have contacts there, please ping them to
> get in touch! (IIRC it was conditional formatting stuff we were most
> recently interested in collaborating on)

I will do that.  I'm lobbying to go all-in with POI and Vaadin Spreadsheet for
the next big thing for my new gig.  I've been a user of POI for over 10 years,
and Vaadin for 5.  If I get my way, we will be paying Vaadin customers and can
have some say in product direction.

For example, I found a bug in their conditional formatting stuff yesterday :)
After I finish the patch unit tests, I'll be filing a bug for them about it.
It may relate to Excel Table structured reference syntax in the conditional
formatting formula, in which case they definitely should be talking to this
team!

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

GW <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|Linux                       |All
            Version|3.12-dev                    |3.15-dev
           Hardware|PC                          |All

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

GW <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #32671|0                           |1
        is obsolete|                            |

--- Comment #6 from GW <[hidden email]> ---
Created attachment 33927
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33927&action=edit
Trunk patch adding support for Structured References in formulas, with unit
tests

The new patch (same date as comment) is against TRUNK, and contains new passing
unit tests to exercise the new code and notice new bugs.

The original patch was missing some corner cases around special characters and
escapes in table names, now covered by tests and handled by the code.

(why does Excel allow named ranges and tables to start with a backslash!?)

There are new comments for some existing and some new code that is there to
improve execution performance at the cost of static cached metadata - changes
to the underlying OOXML objects may not be seen unless the cached data is
reset.  There was already some of that, but it wasn't called out in the JavaDoc
comments.  Not likely in normal use, as most uses are either building or
reading, but not building, reading, modifying, then calculating again, but
since the underlying objects are exposed through public setters, it can happen.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #7 from Javen O'Neal <[hidden email]> ---
Thanks for the quick turn around Greg!
I'm slowly reviewing what you have. I committed StructuredReferences.xlsx in
r1747482.
I also added svn:eol-style=native to all the files in your patch so the diffs
are shorter when reviewing these changes on a different OS.
The import change on Match.java isn't needed.

1. Could you change String Table.isStructuredReference to a compiled Pattern
for performance?
2. We're trying to move all junit3 tests to junit4 tests. Could you update
TestStructuredReferences.java to use junit4 (org.junit.Test with @Test
decorators, see TestXSSFFormulaParser.java for an example)
3. The changes to XSSFRowShifter looks like formulas with structured references
cannot be row-shifted. Is that correct? If so, could you either fix that in
your patch or open a new bug that depends on bug 57840?
4. Add a javadoc to XSSFWorkbook.getTable. This method rebuilds the table cache
and returns a single item. If the cache isn't used elsewhere, then a
non-caching linear search would be faster and use less memory.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

GW <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #33927|0                           |1
        is obsolete|                            |

--- Comment #8 from GW <[hidden email]> ---
Created attachment 33932
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33932&action=edit
Updated patch

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

GW <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #9 from GW <[hidden email]> ---
(In reply to Javen O'Neal from comment #7)

> Thanks for the quick turn around Greg!
> I'm slowly reviewing what you have. I committed StructuredReferences.xlsx in
> r1747482.
> I also added svn:eol-style=native to all the files in your patch so the
> diffs are shorter when reviewing these changes on a different OS.
> The import change on Match.java isn't needed.
>
> 1. Could you change String Table.isStructuredReference to a compiled Pattern
> for performance?
> 2. We're trying to move all junit3 tests to junit4 tests. Could you update
> TestStructuredReferences.java to use junit4 (org.junit.Test with @Test
> decorators, see TestXSSFFormulaParser.java for an example)
> 3. The changes to XSSFRowShifter looks like formulas with structured
> references cannot be row-shifted. Is that correct? If so, could you either
> fix that in your patch or open a new bug that depends on bug 57840?
> 4. Add a javadoc to XSSFWorkbook.getTable. This method rebuilds the table
> cache and returns a single item. If the cache isn't used elsewhere, then a
> non-caching linear search would be faster and use less memory.

Updated attachment has the requested changes.  Thanks for the feedback, I've
been a long time user but this is my first patch.  Here are responses to your
questions and comments:

1. Yes, changed.  I missed this from the original patch I started from, thanks.

2. Done.  The example test I picked turned out to be one not yet updated :)

3. Structured Reference syntax doesn't reference individual rows directly, so
shifting things doesn't affect these.  See the MS documentation link in the
initial description.  All structured reference syntax is to table columns and
parts (#Data, #Headers, etc.)  Formula evaluation dynamically constructs a
3DArea for the reference based on the current Table definition (I'm only
caching references to the XSSFTable objects, not their start/end cell
definitions).  Since those objects already existed, my assumption was that
inserting/deleting rows above a XSSFTable already updates the table start/end.
If that's not the case, that's an existing bug - I haven't tried it.

4. Added JavaDoc

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #10 from Javen O'Neal <[hidden email]> ---
To make it easier to get this to a trunk-eligible state, I've created
xssf_structured_references branch [1] in r1747607 so you and others can
contribute unit tests and improvements with smaller diffs against this branch.
I've applied your changes from attachment 33932 in r1747612.

When this is trunk-eligible, I'll merge this back with trunk.

[1] https://svn.apache.org/viewvc/poi/branches/xssf_structured_references/

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #11 from Javen O'Neal <[hidden email]> ---
Moved getTable cache from XSSFWorkbook to XSSFEvaluationWorkbook to reduce the
frequency of problems with a stale cache in r1747615-r1747616.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #12 from Greg Woolsey <[hidden email]> ---
(In reply to Javen O'Neal from comment #11)
> Moved getTable cache from XSSFWorkbook to XSSFEvaluationWorkbook to reduce
> the frequency of problems with a stale cache in r1747615-r1747616.

This kills performance - the reason I moved it to XSSFWorkbook in the first
place was performance.

Parsing and evaluating formulas on a moderately sized workbook (one table with
6 columns and 50,00 rows, one computed column) and a sheet with a small table
of formulas referencing the data table (VLOOKUPs mostly) took over an hour on a
P5 with 2GB allocated to the VM.

Moving it to XSSFWorkbook reduced this to 6 minutes, which is still awful, the
the problem at that point was the auto-boxing of rownums in all the row methods
in XSSFSheet (15% of total time spent in Integer.compareTo).  Lots of
references via Google for performance tests with auto-boxing vs. primitives vs.
explicitly referenced and retained primitive wrappers (Integer/Long).

The table lookup was performed hundreds of thousands of times, as every cell
evaluation created a new instance of XSSFEvaluationWorkbook, causing the cache
to be rebuilt.  Even just doing a linear search, if there are 3 or 4 tables,
regardless of size, is hugely expensive because the XMLBeans package is
terribly slow for repeated lookups.

I tried to balance the inconvenience/problem of a stale cache with performance.

I think, but haven't verified yet, that the real performance issue is that when
evaluating all the cell formulas in a workbook, if multiple cells reference the
same expression, the expression isn't cached, but recalculated over and over
for all those cells.  This is probably worst with range functions, when a
column of cells do the same VLOOKUP over and over with different inputs against
the same range.  Especially if that range contains formula cells.

All that to say, moving this back is going to have an extreme performance cost,
and the real issues are deep.  They probably deserve their own separate issues:

1. formula evaluations that require access to XML elements/attributes
2. XSSFSheet rownum auto-boxing
3. Formula evaluation intermediate result caching

Eventually I'll try to capture just how many table lookups by name are done vs.
how many formulas reference the table, to see how bad it is.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 57840] [PATCH] Support for structured references with Excel tables.

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

--- Comment #13 from Javen O'Neal <[hidden email]> ---
Could you generate and upload a workbook with either a lot of tables or large
tables that demonstrates the performance problems?

Unfortunately we don't have a meaningful way to track execution time in our
continuous integration tests because of differing resources provided by the
executors. Nonetheless, you could write a unit test that calculates the time
spent evaluating structured references for A/B comparisons (such as the change
from comment 11).

Are tables being looked up on the XSSFWorkbook or the XSSFEvaluationWorkbook?

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

123