Class OfficeParser

    • Constructor Detail

      • OfficeParser

        public OfficeParser()
    • Method Detail

      • extractMacros

        public static void extractMacros​(org.apache.poi.poifs.filesystem.POIFSFileSystem fs,
                                         ContentHandler xhtml,
                                         org.apache.tika.extractor.EmbeddedDocumentExtractor embeddedDocumentExtractor)
                                  throws IOException,
                                         SAXException
        Helper to extract macros from an NPOIFS/vbaProject.bin

        As of POI-3.15-final, there are still some bugs in VBAMacroReader. For now, we are swallowing NPE and other runtime exceptions

        Parameters:
        fs - NPOIFS to extract from
        xhtml - SAX writer
        embeddedDocumentExtractor - extractor for embedded documents
        Throws:
        IOException - on IOException if it occurs during the extraction of the embedded doc
        SAXException - on SAXException for writing to xhtml
      • getSupportedTypes

        public Set<org.apache.tika.mime.MediaType> getSupportedTypes​(org.apache.tika.parser.ParseContext context)
      • parse

        public void parse​(InputStream stream,
                          ContentHandler handler,
                          org.apache.tika.metadata.Metadata metadata,
                          org.apache.tika.parser.ParseContext context)
                   throws IOException,
                          SAXException,
                          org.apache.tika.exception.TikaException
        Extracts properties and text from an MS Document input stream
        Throws:
        IOException
        SAXException
        org.apache.tika.exception.TikaException
      • parse

        protected void parse​(org.apache.poi.poifs.filesystem.DirectoryNode root,
                             org.apache.tika.parser.ParseContext context,
                             org.apache.tika.metadata.Metadata metadata,
                             org.apache.tika.sax.XHTMLContentHandler xhtml)
                      throws IOException,
                             SAXException,
                             org.apache.tika.exception.TikaException
        Throws:
        IOException
        SAXException
        org.apache.tika.exception.TikaException