Class PackageParser

  • All Implemented Interfaces:
    Serializable, org.apache.tika.parser.Parser

    public class PackageParser
    extends org.apache.tika.parser.AbstractEncodingDetectorParser
    Parser for various packaging formats. Package entries will be written to the XHTML event stream as <div class="package-entry"> elements that contain the (optional) entry name as a <h1> element and the full structured body content of the parsed entry.

    User must have JCE Unlimited Strength jars installed for encryption to work with 7Z files (see: COMPRESS-299 and TIKA-1521). If the jars are not installed, an IOException will be thrown, and potentially wrapped in a TikaException.

    See Also:
    Serialized Form
    • Constructor Detail

      • PackageParser

        public PackageParser()
      • PackageParser

        public PackageParser​(org.apache.tika.detect.EncodingDetector encodingDetector)
    • Method Detail

      • handleEntryMetadata

        protected static org.apache.tika.metadata.Metadata handleEntryMetadata​(String name,
                                                                               Date createAt,
                                                                               Date modifiedAt,
                                                                               Long size,
                                                                               org.apache.tika.sax.XHTMLContentHandler xhtml)
                                                                        throws SAXException,
                                                                               IOException,
                                                                               org.apache.tika.exception.TikaException
        Throws:
        SAXException
        IOException
        org.apache.tika.exception.TikaException
      • getSupportedTypes

        public Set<org.apache.tika.mime.MediaType> getSupportedTypes​(org.apache.tika.parser.ParseContext context)
      • setDetectCharsetsInEntryNames

        @Field
        public void setDetectCharsetsInEntryNames​(boolean detectCharsetsInEntryNames)
        Whether or not to run the default charset detector against entry names in ZipFiles. The default is true.
        Parameters:
        detectCharsetsInEntryNames -
      • isDetectCharsetsInEntryNames

        public boolean isDetectCharsetsInEntryNames()