Class TikaProcessor.Builder

    • Constructor Detail

      • Builder

        public Builder()
    • Method Detail

      • appendingSuffix

        public TikaProcessor.Builder appendingSuffix​(java.lang.String suffix)
        Add a suffix to all fields names output by tika.
        Parameters:
        suffix - the suffix to add
        Returns:
        This builder for further configuration
      • truncatingTextTo

        public TikaProcessor.Builder truncatingTextTo​(int chars)
        Convenience override for safety valve to guard against large documents. By default this is set to -1 for no limit on the amount of data to process with Tika.
        Parameters:
        chars - The limit
        Returns:
        This builder for further configuration
      • replacingRawData

        public TikaProcessor.Builder replacingRawData​(boolean replaceRaw)
        Speifiy if the results of tika's analysis should replace the raw document content or not.
        Parameters:
        replaceRaw - if true the original content for the document will be overwritten by tika's extracted output.
        Returns:
        This builder for further configuration
      • intoField

        public TikaProcessor.Builder intoField​(java.lang.String field)
        Send the results of tika's text extraction (text extracted but not metadata) into the supplied field.
        Parameters:
        field - the name of the field to containt the extracted text.
        Returns:
        This builder for further configuration
      • configuredWith

        public TikaProcessor.Builder configuredWith​(org.w3c.dom.Document config)
                                             throws org.apache.tika.exception.TikaException,
                                                    java.io.IOException
        Specify a tika configuration via an XML document you have loaded via filesystem/classpath or other method of your choice.
        Parameters:
        config - The configuration
        Returns:
        This builder for further config
        Throws:
        org.apache.tika.exception.TikaException - if Tika doesn't like your config
        java.io.IOException - if Tika can't find something it needed?