Package org.apache.poi.extractor
Interface POIOLE2TextExtractor
- All Superinterfaces:
AutoCloseable,Closeable,POITextExtractor
- All Known Implementing Classes:
EventBasedExcelExtractor,ExcelExtractor,HPSFPropertiesExtractor,OutlookTextExtractor,PublisherTextExtractor,VisioTextExtractor,Word6Extractor,WordExtractor
Common Parent for OLE2 based Text Extractors
of POI Documents, such as .doc, .xls
You will typically find the implementation of
a given format's text extractor under
org.apache.poi.[format].extractor .
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptiondefault DocumentSummaryInformationReturns the document information metadata for the documentReturn the underlying POIDocumentdefault POITextExtractorReturns an HPSF powered text extractor for the document properties metadata, such as title and author.default DirectoryEntrygetRoot()Return the underlying DirectoryEntry of this document.default SummaryInformationReturns the summary information metadata for the document.Methods inherited from interface org.apache.poi.extractor.POITextExtractor
close, getFilesystem, getText, isCloseFilesystem, setCloseFilesystem
-
Method Details
-
getDocSummaryInformation
Returns the document information metadata for the document- Returns:
- The Document Summary Information or null if it could not be read for this document.
-
getSummaryInformation
Returns the summary information metadata for the document.- Returns:
- The Summary information for the document or null if it could not be read for this document.
-
getMetadataTextExtractor
Returns an HPSF powered text extractor for the document properties metadata, such as title and author.- Specified by:
getMetadataTextExtractorin interfacePOITextExtractor- Returns:
- an instance of POIExtractor that can extract meta-data.
-
getRoot
Return the underlying DirectoryEntry of this document.- Returns:
- the DirectoryEntry that is associated with the POIDocument of this extractor.
-
getDocument
POIDocument getDocument()Return the underlying POIDocument- Specified by:
getDocumentin interfacePOITextExtractor- Returns:
- the underlying POIDocument
-