Package org.apache.tika.parser.pkg
Class PackageParser
- java.lang.Object
-
- org.apache.tika.parser.AbstractEncodingDetectorParser
-
- org.apache.tika.parser.pkg.PackageParser
-
- All Implemented Interfaces:
Serializable,org.apache.tika.parser.Parser
public class PackageParser extends org.apache.tika.parser.AbstractEncodingDetectorParserParser for various packaging formats. Package entries will be written to the XHTML event stream as <div class="package-entry"> elements that contain the (optional) entry name as a <h1> element and the full structured body content of the parsed entry.User must have JCE Unlimited Strength jars installed for encryption to work with 7Z files (see: COMPRESS-299 and TIKA-1521). If the jars are not installed, an IOException will be thrown, and potentially wrapped in a TikaException.
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description PackageParser()PackageParser(org.apache.tika.detect.EncodingDetector encodingDetector)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Set<org.apache.tika.mime.MediaType>getSupportedTypes(org.apache.tika.parser.ParseContext context)protected static org.apache.tika.metadata.MetadatahandleEntryMetadata(String name, Date createAt, Date modifiedAt, Long size, org.apache.tika.sax.XHTMLContentHandler xhtml)booleanisDetectCharsetsInEntryNames()voidparse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)voidsetDetectCharsetsInEntryNames(boolean detectCharsetsInEntryNames)Whether or not to run the default charset detector against entry names in ZipFiles.
-
-
-
Method Detail
-
handleEntryMetadata
protected static org.apache.tika.metadata.Metadata handleEntryMetadata(String name, Date createAt, Date modifiedAt, Long size, org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException, IOException, org.apache.tika.exception.TikaException
- Throws:
SAXExceptionIOExceptionorg.apache.tika.exception.TikaException
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
-
parse
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
- Throws:
IOExceptionSAXExceptionorg.apache.tika.exception.TikaException
-
setDetectCharsetsInEntryNames
@Field public void setDetectCharsetsInEntryNames(boolean detectCharsetsInEntryNames)
Whether or not to run the default charset detector against entry names in ZipFiles. The default istrue.- Parameters:
detectCharsetsInEntryNames-
-
isDetectCharsetsInEntryNames
public boolean isDetectCharsetsInEntryNames()
-
-