Class OfficeParser

    • Constructor Detail

      • OfficeParser

        public OfficeParser()
    • Method Detail

      • extractMacros

        public static void extractMacros​(org.apache.poi.poifs.filesystem.POIFSFileSystem fs,
                                         ContentHandler xhtml,
                                         org.apache.tika.extractor.EmbeddedDocumentExtractor embeddedDocumentExtractor)
                                  throws IOException,
                                         SAXException
        Helper to extract macros from an NPOIFS/vbaProject.bin

        As of POI-3.15-final, there are still some bugs in VBAMacroReader. For now, we are swallowing NPE and other runtime exceptions

        Parameters:
        fs - NPOIFS to extract from
        xhtml - SAX writer
        embeddedDocumentExtractor - extractor for embedded documents
        Throws:
        IOException - on IOException if it occurs during the extraction of the embedded doc
        SAXException - on SAXException for writing to xhtml
      • getSupportedTypes

        public Set<org.apache.tika.mime.MediaType> getSupportedTypes​(org.apache.tika.parser.ParseContext context)
      • parse

        public void parse​(InputStream stream,
                          ContentHandler handler,
                          org.apache.tika.metadata.Metadata metadata,
                          org.apache.tika.parser.ParseContext context)
                   throws IOException,
                          SAXException,
                          org.apache.tika.exception.TikaException
        Extracts properties and text from an MS Document input stream
        Throws:
        IOException
        SAXException
        org.apache.tika.exception.TikaException
      • parse

        protected void parse​(org.apache.poi.poifs.filesystem.DirectoryNode root,
                             org.apache.tika.parser.ParseContext context,
                             org.apache.tika.metadata.Metadata metadata,
                             org.apache.tika.sax.XHTMLContentHandler xhtml)
                      throws IOException,
                             SAXException,
                             org.apache.tika.exception.TikaException
        Throws:
        IOException
        SAXException
        org.apache.tika.exception.TikaException
      • getUCEntry

        public static org.apache.poi.poifs.filesystem.Entry getUCEntry​(org.apache.poi.poifs.filesystem.DirectoryEntry root,
                                                                       String ucTarget)
        Looks for entry within root (non-recursive) that has an upper-cased name that equals ucTarget
        Parameters:
        root -
        ucTarget -
        Returns: