[Bug 57699] New: Suport Strict OOXML files

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] New: Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

            Bug ID: 57699
           Summary: Suport Strict OOXML files
           Product: POI
           Version: 3.12-dev
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: [hidden email]
          Reporter: [hidden email]

Office 2013 has added the option to save as "strict" ooxml files, which as
reported in
http://stackoverflow.com/questions/29023542/how-to-parse-strict-xlsx-file-in-java
have a different core type

In r1666410 some sample strict xlsx files have been added, support is needed to
support them (for reading at least)

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #1 from Nick Burch <[hidden email]> ---
It looks like some namespace munging is going to be required to properly
support this. After making changes to ExtractorFactory and POIXMLDocumentPart
to handle the differing core relationship type, it now fails at the xmlbeans
level:

org.apache.xmlbeans.XmlException: error: The document is not a
workbook@http://schemas.openxmlformats.org/spreadsheetml/2006/main: document
element namespace mismatch expected
"http://schemas.openxmlformats.org/spreadsheetml/2006/main" got
"http://purl.oclc.org/ooxml/spreadsheetml/main"
    at
org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:399)

Caused by: org.apache.xmlbeans.XmlException: error: The document is not a
workbook@http://schemas.openxmlformats.org/spreadsheetml/2006/main: document
element namespace mismatch expected
"http://schemas.openxmlformats.org/spreadsheetml/2006/main" got
"http://purl.oclc.org/ooxml/spreadsheetml/main"
    at
org.apache.xmlbeans.impl.store.Locale.verifyDocumentType(Locale.java:459)
    at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument(Locale.java:364)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1280)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
    at
org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at
org.openxmlformats.schemas.spreadsheetml.x2006.main.WorkbookDocument$Factory.parse(Unknown
Source)

The purl namespace crops up in most of the xml files at least somewhere, so a
general mapping solution is probably required if we want to take this further

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

Dominik Stadler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #2 from Dominik Stadler <[hidden email]> ---
*** Bug 57914 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #3 from PJ Fanning <[hidden email]> ---
http://pyxb.sourceforge.net/PyXB-1.2.2/bundles.html has a list of namespace
URLs that could be used in a mapping class.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #4 from PJ Fanning <[hidden email]> ---
Without spending much time on this, I have been unable to track down the XSDs
with the purl namespaces (OOXML Strict). From accounts, they should be very
similar to the OOXML Transitional schemas other than the namespaces.
2 approaches pop to mind.
1. In poi-ooxml-schemas, we could create XmlBeans for the OOXML Strict
namespaces by using modified versions of the OOXML Transitional schemas.
2. support a transformation of the XML in input docs so that the OOXML Strict
namespaces are replaced by OOXML transitional equivalents.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #5 from PJ Fanning <[hidden email]> ---
I have added some basic prototype code to convert Strict OOXML files to
https://github.com/pjfanning/ooxml-strict-converter - there is still a lot of
work to do but I'm just posting it here if anyone wants to review what I'm
doing.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #6 from Javen O'Neal <[hidden email]> ---
Looks good so far.

In the interest of wanting to start committing this early so that we can update
our unit tests to handle XSSF Strict:
* Are we planning on having XSSFWorkbook transparently handle strict workbooks
or will be have a different class for that?
Will this be in the o.a.p.xssf.usermodel package or are we going to package it
in o.a.p.xssf.extractor or create o.a.p.xssf.strict?

In the long term, I would like for POI to be able to read and write strict
files without having to downconvert to non-strict. This probably affects how we
go about packaging this--making it more than a distant examples or static
utility converter class.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #7 from Dominik Stadler <[hidden email]> ---
FYI, there is also a converter provided by Microsoft:
https://www.microsoft.com/en-us/download/details.aspx?id=38828, could come in
handy when doing development work on this topic.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Suport Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

--- Comment #8 from PJ Fanning <[hidden email]> ---
Hi Javen,
I can understand that we will want to be able to save POI documents using
Strict OOXML but my focus for now is just on the down-porting to Transitional
OOXML to allow parsing.
For now, I'm looking at a standalone utility to down-port but this could be
plugged into XSSFWorkbook and XSSF extractor under the hood. They could either
do some pre-processing of the input doc to determine if it is Strict OOXML and
the down-port to a temp file and then read from the temp file.
My prototype code is working now for the SimpleStrict.xlsx in the POI test data
folder.
I'll see about testing with more input files.

--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Support Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Suport Strict OOXML files   |Support Strict OOXML files

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 57699] Support Strict OOXML files

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57699

Sergei Malafeev <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...