Class POIXMLExtractorFactory

java.lang.Object
org.apache.poi.ooxml.extractor.POIXMLExtractorFactory
All Implemented Interfaces:
ExtractorProvider

public final class POIXMLExtractorFactory extends Object implements ExtractorProvider
Figures out the correct POITextExtractor for your supplied document, and returns it.

Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath

Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!

  • Constructor Details

    • POIXMLExtractorFactory

      public POIXMLExtractorFactory()
  • Method Details

    • accepts

      public boolean accepts(FileMagic fm)
      Specified by:
      accepts in interface ExtractorProvider
    • getThreadPrefersEventExtractors

      public static boolean getThreadPrefersEventExtractors()
      Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.
    • getAllThreadsPreferEventExtractors

      public static Boolean getAllThreadsPreferEventExtractors()
      Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.
    • setThreadPrefersEventExtractors

      public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
      Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
    • setAllThreadsPreferEventExtractors

      public static void setAllThreadsPreferEventExtractors(Boolean preferEventExtractors)
      Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.
    • getPreferEventExtractor

      public static boolean getPreferEventExtractor()
      Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.
    • create

      public POITextExtractor create(File f, String password) throws IOException
      Description copied from interface: ExtractorProvider
      Create Extractor via file
      Specified by:
      create in interface ExtractorProvider
      Parameters:
      f - the file
      password - the password or null if not encrypted
      Returns:
      the extractor
      Throws:
      IOException - if file can't be read or parsed
    • create

      public POITextExtractor create(InputStream inp, String password) throws IOException
      Description copied from interface: ExtractorProvider
      Create Extractor via InputStream
      Specified by:
      create in interface ExtractorProvider
      Parameters:
      inp - the stream
      password - the password or null if not encrypted
      Returns:
      the extractor
      Throws:
      IOException - if stream can't be read or parsed
    • create

      public POIXMLTextExtractor create(OPCPackage pkg) throws IOException
      Tries to determine the actual type of file and produces a matching text-extractor for it.
      Parameters:
      pkg - An OPCPackage.
      Returns:
      A POIXMLTextExtractor for the given file.
      Throws:
      IOException - If an error occurs while reading the file
      IllegalArgumentException - If no matching file type could be found.
    • create

      public POITextExtractor create(POIFSFileSystem fs) throws IOException
      Throws:
      IOException
    • create

      public POITextExtractor create(DirectoryNode poifsDir, String password) throws IOException
      Description copied from interface: ExtractorProvider
      Create Extractor from POIFS node
      Specified by:
      create in interface ExtractorProvider
      Parameters:
      poifsDir - the node
      password - the password or null if not encrypted
      Returns:
      the extractor
      Throws:
      IOException - if node can't be parsed