I'm currently cleaning up the various X**F / H**F factories incl. extractors, with the following changes (= API breaks):
a) provide service locator pattern to conform with JigSaw - see WorkbookFactory / -Provider on how it would look like.
Although chances are that won't work with OSGi, we need that approach with JigSaw.
There are workarounds for this: http://www.basepatterns.org/java/2009/09/30/osgi-service-locator.html.
Btw. the integration tests are currently not executing the OOXML extractor, as I forgot to update a reflection class name string.
b) the above leads to service provider instances, therefore I replace static factory methods with instance methods.
Furthermore I rename all methods like "createXY" (createWorkbook) to "create(...)"
this is just more straight forward than e.g. POIXMLExtractorFactory a; a.createExtractor(...) - i.e. I already have the extractor factory handle, why would I need to repeat that I'd like to create an extractor ...
I haven't planed to provide a generic factory interface yet, but that would make things easier later on.
c) remove main() methods in the extractor. Those look like test methods and I don't want our source verification tools to complain about poor command line handling
d) use interfaces instead of abstract classes - ONLY POI*TextExtractor
those abstract classes only add minimal logic which can be handled by default methods.
on the other side I have problems with common classes like SlideShowExtractor and a new sub extractor for OOXML offering additional OOXML logic when keeping the abstract classes
e) removing a few obsolete constructs like PowerPointExtractor
f) rename factories
org.apache.poi.ooxml.extractor.ExtractorFactory -> org.apache.poi.ooxml.extractor.POIXMLExtractorFactory
org.apache.poi.extractor.OLE2ExtractorFactory -> org.apache.poi.extractor.ExtractorFactory
It's confusing that the specialized factory has a generic name and vice versa.