Interface POITextExtractor

All Superinterfaces:
AutoCloseable, Closeable
All Known Subinterfaces:
POIOLE2TextExtractor, POIXMLTextExtractor
All Known Implementing Classes:
EventBasedExcelExtractor, ExcelExtractor, HPSFPropertiesExtractor, OldExcelExtractor, OutlookTextExtractor, POIXMLPropertiesTextExtractor, PublisherTextExtractor, org.apache.poi.sl.extractor.SlideShowExtractor, VisioTextExtractor, Word6Extractor, WordExtractor, XDGFVisioExtractor, XPSTextExtractor, XSLFEventBasedPowerPointExtractor, XSLFExtractor, XSSFBEventBasedExcelExtractor, XSSFEventBasedExcelExtractor, XSSFExcelExtractor, XWPFEventBasedWordExtractor, XWPFWordExtractor

public interface POITextExtractor extends Closeable
Common Parent for Text Extractors of POI Documents. You will typically find the implementation of a given format's text extractor under org.apache.poi.[format].extractor .
See Also:
  • Method Details

    • getText

      String getText()
      Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
      Returns:
      All the text from the document
    • getMetadataTextExtractor

      POITextExtractor getMetadataTextExtractor()
      Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.
      Returns:
      the metadata and text extractor
    • setCloseFilesystem

      void setCloseFilesystem(boolean doCloseFilesystem)
      Parameters:
      doCloseFilesystem - true (default), if underlying resources/filesystem should be closed on close()
    • isCloseFilesystem

      boolean isCloseFilesystem()
      Returns:
      true, if resources/filesystem should be closed on close()
    • getFilesystem

      Closeable getFilesystem()
      Returns:
      The underlying resources/filesystem
    • close

      default void close() throws IOException
      Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • getDocument

      Object getDocument()
      Returns:
      the processed document