[Bug 61296] New: Bring over missing constants from Tika

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] New: Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

            Bug ID: 61296
           Summary: Bring over missing constants from Tika
           Product: POI
           Version: 3.17-dev
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: POI Overall
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

In Apache Tika, under
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/, there's now a
surprisingly large number of POI and OOXML constants in the parser codebase

We should review these, add our own constants where we don't already have them
(eg relationships or types we don't have defined), then swap the Tika classes
to using our constants after a release

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

--- Comment #1 from Javen O'Neal <[hidden email]> ---
Created attachment 35138
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35138&action=edit
a quick comparison of Tika and POI constants

https://github.com/apache/tika/tree/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/

git clone https://github.com/apache/tika.git apache-tika
pushd apache-tika
cd tika-parsers/src/main/java/org/apache/tika/parser/microsoft/
grep -r -P "(static final|final static|http://schemas|vnd|urn)" .

Most notably,
* ./ooxml/AbstractOOXMLExtractor.java has 8 relationship schema URLS and 1
ooxml mime type
* ./ooxml/OOXMLWordAndPowerPointTextHandler.java has 6 schema urls and 2 urns
* ./POIFSContainerDetector.java has several mime types
And a few others
See attachment for a list of current constants that could be copied over.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

--- Comment #2 from Javen O'Neal <[hidden email]> ---
r1801901

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35138|text/tab-separated-values   |text/csv
          mime type|                            |

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35138|text/csv                    |text/tab-separated-values
          mime type|                            |

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

--- Comment #3 from Javen O'Neal <[hidden email]> ---
r1801903

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35138|0                           |1
        is obsolete|                            |

--- Comment #4 from Javen O'Neal <[hidden email]> ---
Created attachment 35139
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35139&action=edit
a quick comparison of Tika and POI constants

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

Javen O'Neal <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|Linux                       |All

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

--- Comment #5 from Tim Allison <[hidden email]> ---
Yup.  Sorry.  I've been meaning to do this.  Thank you, Nick and Javen!

Speaking of which...is there any interest in moving over the SAX-based
docx/pptx code from Tika into POI?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61296] Bring over missing constants from Tika

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61296

--- Comment #6 from Javen O'Neal <[hidden email]> ---
Yes, absolutely!

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...