SAX v DOM parser for docx

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

SAX v DOM parser for docx

Allison, Timothy B.
I finally got around to comparing the experimental SAX parser over on Tika with POI/DOM-based parser for docx on the 170k docx files we have.

http://162.242.228.174/reports/dom_vs_sax_docx.tar.gz

Fewer exceptions...more content.  Both are only slight, but overall, this looks promising.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]