Class AbstractOOXMLExtractor
java.lang.Object
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- All Implemented Interfaces:
OOXMLExtractor
- Direct Known Subclasses:
POIXMLTextExtractorDecorator,SXSLFPowerPointExtractorDecorator,SXWPFWordExtractorDecorator,XPSExtractorDecorator,XSLFPowerPointExtractorDecorator,XSSFExcelExtractorDecorator,XWPFWordExtractorDecorator
Base class for all Tika OOXML extractors.
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method
buildXHTML(XHTMLContentHandler) that
populates the XHTMLContentHandler object received as parameter.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the opened document.POIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI.voidgetXHTML(ContentHandler handler, Metadata metadata, ParseContext context) Parses the document into a sequence of XHTML SAX events sent to the given content handler.
-
Constructor Details
-
AbstractOOXMLExtractor
-
-
Method Details
-
getDocument
Description copied from interface:OOXMLExtractorReturns the opened document.- Specified by:
getDocumentin interfaceOOXMLExtractor- See Also:
-
getMetadataExtractor
Description copied from interface:OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI.- Specified by:
getMetadataExtractorin interfaceOOXMLExtractor- See Also:
-
getXHTML
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, XmlException, IOException, TikaException Description copied from interface:OOXMLExtractorParses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
getXHTMLin interfaceOOXMLExtractor- Throws:
SAXExceptionXmlExceptionIOExceptionTikaException- See Also:
-