[Bug 64165] New: Can't read in rows without 'r' attribute

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 64165] New: Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

            Bug ID: 64165
           Summary: Can't read in rows without 'r' attribute
           Product: POI
           Version: 4.1.x-dev
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

*I have a proposed fix for this (at least for XSSF). I'll attach a patch file
shortly, I just wanted to secure a Bugzilla number first.

When creating a workbook from an existing file, I noticed that rows without an
'r' XML attribute (used for a row number in this case) are not indexed
correctly in the sheet object. See XSSFSheet.initRows() below. getRowNum()
returns -1 for all rows and all but the last row is lost.

My version of Excel can handle these rows, it just reads them in consecutively.
When saving a workbook, it will add the 'r' attribute. I ran into this issue
when trying to read in a file that was exported from a JavaScript spreadsheet
tool. The authors of that tool claim their XML meets Microsoft's standards.

Current code:
    private void initRows(CTWorksheet worksheetParam) {
        _rows.clear();
        tables = new TreeMap<>();
        sharedFormulas = new HashMap<>();
        arrayFormulas = new ArrayList<>();
        for (CTRow row : worksheetParam.getSheetData().getRowArray()) {
            XSSFRow r = new XSSFRow(row, this);
            // Performance optimization: explicit boxing is slightly faster
            than auto-unboxing, though may use more memory
            //noinspection UnnecessaryBoxing
            final Integer rownumI = Integer.valueOf(r.getRowNum()); //
            NOSONAR
            _rows.put(rownumI, r);
        }
    }

Proposed change:
    private void initRows(CTWorksheet worksheetParam) {
        _rows.clear();
        tables = new TreeMap<>();
        sharedFormulas = new HashMap<>();
        arrayFormulas = new ArrayList<>();
        int rowIndex = 0;
        for (CTRow row : worksheetParam.getSheetData().getRowArray()) {
            XSSFRow r = new XSSFRow(row, this);
            if(r.getRowNum() == -1) {
                r.setRowNum(rowIndex);
                rowIndex++;
            }
            rowIndex = r.getRowNum();
            // Performance optimization: explicit boxing is slightly faster
            than auto-unboxing, though may use more memory
            //noinspection UnnecessaryBoxing
            final Integer rownumI = Integer.valueOf(rowIndex); // NOSONAR
            _rows.put(rownumI, r);
        }
    }

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64165] Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

John Claxton <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #1 from John Claxton <[hidden email]> ---
Created attachment 37030
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37030&action=edit
Proposed patch.tar.gz

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64165] Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

Axel Howind <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #2 from Axel Howind <[hidden email]> ---
IMHO this is a good idea. But consider a file where the row numbers are missing
in between. Say the last row before the rows with missing numbers had the
number n. Then the next row will be assigned the same number. That doesn’t look
right

Another thing to consider is that we might have to re-adjust row numbers of the
following rows. Consider rows 1,2,3 all with correct row numbers. Then 5 rows
without row numbers, followed by a row with number 4. How should row numbers be
handled then? What does Excel do in that case?

Do you have (or can you create) excel sheets for all three case?
 1. row numbers missing from the start of the file
 2. row numbers missing in between
 3. row numbers missing in between, but would use the same row numbers as the
following rows

We should really run this on these test sheets and also create a unit test that
covers all three cases.

Thanks for contributing!
Axel

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64165] Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

--- Comment #3 from John Claxton <[hidden email]> ---
Good point, I didn't really consider sheets with partial row numbers, my
examples were all or nothing. I'll send a new patch file that covers all three
cases.

Thanks,
John Claxton

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64165] Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #4 from Dominik Stadler <[hidden email]> ---
Any plans to provide the mentioned updated patch?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64165] Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

John Claxton <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #37030|0                           |1
        is obsolete|                            |

--- Comment #5 from John Claxton <[hidden email]> ---
Created attachment 37733
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37733&action=edit
Patch file for a potential fix & tests.

Here's a new patch, with some caveats.
1. I was having lots of trouble setting up my project and couldn't get tests to
run. It seems like a problem with my Ant setup, not the code itself, so
hopefully the tests all pass. My apologies if they don't. (Through a kluge, I
did run the tests that I added in this patch, and they pass.)
2. For the third case Axel mentioned, I'm not sure what the correct behavior
would be. Excel interprets that as invalid xml, and does its best to recover,
but it's not clear what logic it uses. So I wrote a test for that, but
commented it out for now. There are some further details in a test comment in
the patch file.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64165] Can't read in rows without 'r' attribute

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64165

John Claxton <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]