Class EventBasedExcelExtractor

java.lang.Object
org.apache.poi.hssf.extractor.EventBasedExcelExtractor
All Implemented Interfaces:
Closeable, AutoCloseable, POIOLE2TextExtractor, POITextExtractor, ExcelExtractor

public class EventBasedExcelExtractor extends Object implements POIOLE2TextExtractor, ExcelExtractor
A text extractor for Excel files, that is based on the HSSF EventUserModel API. It will typically use less memory than ExcelExtractor, but may not provide the same richness of formatting. Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.

To turn an excel file into a CSV or similar, then see the XLS2CSVmra example

See Also:
  • Constructor Details

    • EventBasedExcelExtractor

      public EventBasedExcelExtractor(DirectoryNode dir)
    • EventBasedExcelExtractor

      public EventBasedExcelExtractor(POIFSFileSystem fs)
  • Method Details

    • getDocSummaryInformation

      public DocumentSummaryInformation getDocSummaryInformation()
      Would return the document information metadata for the document, if we supported it
      Specified by:
      getDocSummaryInformation in interface POIOLE2TextExtractor
      Returns:
      The Document Summary Information or null if it could not be read for this document.
    • getSummaryInformation

      public SummaryInformation getSummaryInformation()
      Would return the summary information metadata for the document, if we supported it
      Specified by:
      getSummaryInformation in interface POIOLE2TextExtractor
      Returns:
      The Summary information for the document or null if it could not be read for this document.
    • setIncludeCellComments

      public void setIncludeCellComments(boolean includeComments)
      Would control the inclusion of cell comments from the document, if we supported it
      Specified by:
      setIncludeCellComments in interface ExcelExtractor
      Parameters:
      includeComments - true if cell comments should be included
    • setIncludeHeadersFooters

      public void setIncludeHeadersFooters(boolean includeHeadersFooters)
      Would control the inclusion of headers and footers from the document, if we supported it
      Specified by:
      setIncludeHeadersFooters in interface ExcelExtractor
      Parameters:
      includeHeadersFooters - true if headers and footers should be included
    • setIncludeSheetNames

      public void setIncludeSheetNames(boolean includeSheetNames)
      Should sheet names be included? Default is true
      Specified by:
      setIncludeSheetNames in interface ExcelExtractor
      Parameters:
      includeSheetNames - true if the sheet names should be included
    • setFormulasNotResults

      public void setFormulasNotResults(boolean formulasNotResults)
      Should we return the formula itself, and not the result it produces? Default is false
      Specified by:
      setFormulasNotResults in interface ExcelExtractor
      Parameters:
      formulasNotResults - true if the formula itself is returned
    • getText

      public String getText()
      Retreives the text contents of the file
      Specified by:
      getText in interface ExcelExtractor
      Specified by:
      getText in interface POITextExtractor
      Returns:
      All the text from the document
    • setCloseFilesystem

      public void setCloseFilesystem(boolean doCloseFilesystem)
      Specified by:
      setCloseFilesystem in interface POITextExtractor
      Parameters:
      doCloseFilesystem - true (default), if underlying resources/filesystem should be closed on POITextExtractor.close()
    • isCloseFilesystem

      public boolean isCloseFilesystem()
      Specified by:
      isCloseFilesystem in interface POITextExtractor
      Returns:
      true, if resources/filesystem should be closed on POITextExtractor.close()
    • getFilesystem

      public Closeable getFilesystem()
      Specified by:
      getFilesystem in interface POITextExtractor
      Returns:
      The underlying resources/filesystem
    • getDocument

      public POIDocument getDocument()
      Description copied from interface: POIOLE2TextExtractor
      Return the underlying POIDocument
      Specified by:
      getDocument in interface POIOLE2TextExtractor
      Specified by:
      getDocument in interface POITextExtractor
      Returns:
      the underlying POIDocument
    • getRoot

      public DirectoryEntry getRoot()
      Description copied from interface: POIOLE2TextExtractor
      Return the underlying DirectoryEntry of this document.
      Specified by:
      getRoot in interface POIOLE2TextExtractor
      Returns:
      the DirectoryEntry that is associated with the POIDocument of this extractor.
    • close

      public void close() throws IOException
      Description copied from interface: POITextExtractor
      Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface POITextExtractor
      Throws:
      IOException