Interface ExtractorProvider

All Known Implementing Classes:
MainExtractorFactory, POIXMLExtractorFactory

public interface ExtractorProvider
  • Method Details

    • accepts

      boolean accepts(FileMagic fm)
    • create

      POITextExtractor create(File file, String password) throws IOException
      Create Extractor via file
      Parameters:
      file - the file
      password - the password or null if not encrypted
      Returns:
      the extractor
      Throws:
      IOException - if file can't be read or parsed
    • create

      POITextExtractor create(InputStream inputStream, String password) throws IOException
      Create Extractor via InputStream
      Parameters:
      inputStream - the stream
      password - the password or null if not encrypted
      Returns:
      the extractor
      Throws:
      IOException - if stream can't be read or parsed
    • create

      POITextExtractor create(DirectoryNode poifsDir, String password) throws IOException
      Create Extractor from POIFS node
      Parameters:
      poifsDir - the node
      password - the password or null if not encrypted
      Returns:
      the extractor
      Throws:
      IOException - if node can't be parsed
      IllegalStateException - if processing fails for some other reason, e.g. missing JCE Unlimited Strength Jurisdiction Policy files while handling encrypted files.
    • identifyEmbeddedResources

      default void identifyEmbeddedResources(POIOLE2TextExtractor ext, List<Entry> dirs, List<InputStream> nonPOIFS) throws IOException
      Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one open POITextExtractor for each embedded file.
      Parameters:
      ext - the extractor holding the directory to start parsing
      dirs - a list to be filled with directory references holding embedded
      nonPOIFS - a list to be filled with streams which aren't based on POIFS entries
      Throws:
      IOException - when the format specific extraction fails because of invalid entires