Class XSSFEventBasedExcelExtractor

java.lang.Object
org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor
All Implemented Interfaces:
Closeable, AutoCloseable, POITextExtractor, POIXMLTextExtractor, ExcelExtractor
Direct Known Subclasses:
XSSFBEventBasedExcelExtractor

public class XSSFEventBasedExcelExtractor extends Object implements POIXMLTextExtractor, ExcelExtractor
Implementation of a text extractor from OOXML Excel files that uses SAX event based parsing.
  • Constructor Details

  • Method Details

    • setIncludeSheetNames

      public void setIncludeSheetNames(boolean includeSheetNames)
      Should sheet names be included? Default is true
      Specified by:
      setIncludeSheetNames in interface ExcelExtractor
      Parameters:
      includeSheetNames - true if the sheet names should be included
    • getIncludeSheetNames

      public boolean getIncludeSheetNames()
      Returns:
      whether to include sheet names
      Since:
      3.16-beta3
    • setFormulasNotResults

      public void setFormulasNotResults(boolean formulasNotResults)
      Should we return the formula itself, and not the result it produces? Default is false
      Specified by:
      setFormulasNotResults in interface ExcelExtractor
      Parameters:
      formulasNotResults - true if the formula itself is returned
    • getFormulasNotResults

      public boolean getFormulasNotResults()
      Returns:
      whether to include formulas but not results
      Since:
      3.16-beta3
    • setIncludeHeadersFooters

      public void setIncludeHeadersFooters(boolean includeHeadersFooters)
      Should headers and footers be included? Default is true
      Specified by:
      setIncludeHeadersFooters in interface ExcelExtractor
      Parameters:
      includeHeadersFooters - true if headers and footers should be included
    • getIncludeHeadersFooters

      public boolean getIncludeHeadersFooters()
      Returns:
      whether or not to include headers and footers
      Since:
      3.16-beta3
    • setIncludeTextBoxes

      public void setIncludeTextBoxes(boolean includeTextBoxes)
      Should text from textboxes be included? Default is true
    • getIncludeTextBoxes

      public boolean getIncludeTextBoxes()
      Returns:
      whether or not to extract textboxes
      Since:
      3.16-beta3
    • setIncludeCellComments

      public void setIncludeCellComments(boolean includeCellComments)
      Should cell comments be included? Default is false
      Specified by:
      setIncludeCellComments in interface ExcelExtractor
      Parameters:
      includeCellComments - true if cell comments should be included
    • getIncludeCellComments

      public boolean getIncludeCellComments()
      Returns:
      whether cell comments should be included
      Since:
      3.16-beta3
    • setConcatenatePhoneticRuns

      public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
      Concatenate text from <rPh> text elements in SharedStringsTable Default is true;
      Parameters:
      concatenatePhoneticRuns - true if runs should be concatenated, false otherwise
    • setLocale

      public void setLocale(Locale locale)
    • getLocale

      public Locale getLocale()
      Returns:
      locale
      Since:
      3.16-beta3
    • getPackage

      public OPCPackage getPackage()
      Returns the opened OPCPackage container.
      Specified by:
      getPackage in interface POIXMLTextExtractor
      Returns:
      the opened OPCPackage
    • getCoreProperties

      public POIXMLProperties.CoreProperties getCoreProperties()
      Returns the core document properties
      Specified by:
      getCoreProperties in interface POIXMLTextExtractor
      Returns:
      the core document properties
    • getExtendedProperties

      public POIXMLProperties.ExtendedProperties getExtendedProperties()
      Returns the extended document properties
      Specified by:
      getExtendedProperties in interface POIXMLTextExtractor
      Returns:
      the extended document properties
    • getCustomProperties

      public POIXMLProperties.CustomProperties getCustomProperties()
      Returns the custom document properties
      Specified by:
      getCustomProperties in interface POIXMLTextExtractor
      Returns:
      the custom document properties
    • processSheet

      public void processSheet(XSSFSheetXMLHandler.SheetContentsHandler sheetContentsExtractor, Styles styles, Comments comments, SharedStrings strings, InputStream sheetInputStream) throws IOException, SAXException
      Processes the given sheet
      Throws:
      IOException
      SAXException
    • getText

      public String getText()
      Processes the file and returns the text
      Specified by:
      getText in interface ExcelExtractor
      Specified by:
      getText in interface POITextExtractor
      Returns:
      All the text from the document
    • getDocument

      public POIXMLDocument getDocument()
      Description copied from interface: POIXMLTextExtractor
      Returns opened document
      Specified by:
      getDocument in interface POITextExtractor
      Specified by:
      getDocument in interface POIXMLTextExtractor
      Returns:
      the opened document
    • setCloseFilesystem

      public void setCloseFilesystem(boolean doCloseFilesystem)
      Specified by:
      setCloseFilesystem in interface POITextExtractor
      Parameters:
      doCloseFilesystem - true (default), if underlying resources/filesystem should be closed on POITextExtractor.close()
    • isCloseFilesystem

      public boolean isCloseFilesystem()
      Specified by:
      isCloseFilesystem in interface POITextExtractor
      Returns:
      true, if resources/filesystem should be closed on POITextExtractor.close()
    • getFilesystem

      public OPCPackage getFilesystem()
      Specified by:
      getFilesystem in interface POITextExtractor
      Returns:
      The underlying resources/filesystem