public class OfficeParser extends AbstractOfficeParser
| Modifier and Type | Class and Description |
|---|---|
static class |
OfficeParser.POIFSDocumentType |
| Constructor and Description |
|---|
OfficeParser() |
| Modifier and Type | Method and Description |
|---|---|
static void |
extractMacros(org.apache.poi.poifs.filesystem.POIFSFileSystem fs,
ContentHandler xhtml,
org.apache.tika.extractor.EmbeddedDocumentExtractor embeddedDocumentExtractor)
Helper to extract macros from an NPOIFS/vbaProject.bin
As of POI-3.15-final, there are still some bugs in VBAMacroReader.
|
Set<org.apache.tika.mime.MediaType> |
getSupportedTypes(org.apache.tika.parser.ParseContext context) |
static org.apache.poi.poifs.filesystem.Entry |
getUCEntry(org.apache.poi.poifs.filesystem.DirectoryEntry root,
String ucTarget)
Looks for entry within root (non-recursive) that has an upper-cased
name that equals ucTarget
|
protected void |
parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
org.apache.tika.parser.ParseContext context,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.sax.XHTMLContentHandler xhtml) |
void |
parse(InputStream stream,
ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
Extracts properties and text from an MS Document input stream
|
configure, getByteArrayMaxOverride, getDateFormatOverride, isConcatenatePhoneticRuns, isExtractAllAlternativesFromMSG, isExtractMacros, isIncludeDeletedContent, isIncludeHeadersAndFooters, isIncludeMoveFromContent, isIncludeShapeBasedContent, isUseSAXDocxExtractor, isUseSAXPptxExtractor, setByteArrayMaxOverride, setConcatenatePhoneticRuns, setDateFormatOverride, setExtractAllAlternativesFromMSG, setExtractMacros, setIncludeDeletedContent, setIncludeHeadersAndFooters, setIncludeMoveFromContent, setIncludeShapeBasedContent, setUseSAXDocxExtractor, setUseSAXPptxExtractorpublic static void extractMacros(org.apache.poi.poifs.filesystem.POIFSFileSystem fs,
ContentHandler xhtml,
org.apache.tika.extractor.EmbeddedDocumentExtractor embeddedDocumentExtractor)
throws IOException,
SAXException
As of POI-3.15-final, there are still some bugs in VBAMacroReader. For now, we are swallowing NPE and other runtime exceptions
fs - NPOIFS to extract fromxhtml - SAX writerembeddedDocumentExtractor - extractor for embedded documentsIOException - on IOException if it occurs during the extraction of the embedded docSAXException - on SAXException for writing to xhtmlpublic Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
IOExceptionSAXExceptionorg.apache.tika.exception.TikaExceptionprotected void parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
org.apache.tika.parser.ParseContext context,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.sax.XHTMLContentHandler xhtml)
throws IOException,
SAXException,
org.apache.tika.exception.TikaException
IOExceptionSAXExceptionorg.apache.tika.exception.TikaExceptionpublic static org.apache.poi.poifs.filesystem.Entry getUCEntry(org.apache.poi.poifs.filesystem.DirectoryEntry root,
String ucTarget)
root - ucTarget - Copyright © 2007–2025 The Apache Software Foundation. All rights reserved.