Package org.jesterj.ingest.processors
Class TikaProcessor.Builder
- java.lang.Object
-
- org.jesterj.ingest.model.impl.NamedBuilder<TikaProcessor>
-
- org.jesterj.ingest.processors.TikaProcessor.Builder
-
- All Implemented Interfaces:
Buildable<TikaProcessor>,ConfiguredBuildable<TikaProcessor>
- Enclosing class:
- TikaProcessor
public static class TikaProcessor.Builder extends NamedBuilder<TikaProcessor>
-
-
Constructor Summary
Constructors Constructor Description Builder()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TikaProcessor.BuilderappendingSuffix(java.lang.String suffix)Add a suffix to all fields names output by tika.TikaProcessorbuild()TikaProcessor.BuilderconfiguredWith(org.w3c.dom.Document config)Specify a tika configuration via an XML document you have loaded via filesystem/classpath or other method of your choice.protected TikaProcessorgetObj()TikaProcessor.BuilderintoField(java.lang.String field)Send the results of tika's text extraction (text extracted but not metadata) into the supplied field.TikaProcessor.Buildernamed(java.lang.String name)TikaProcessor.BuilderreplacingRawData(boolean replaceRaw)Speifiy if the results of tika's analysis should replace the raw document content or not.TikaProcessor.BuildertruncatingTextTo(int chars)Convenience override for safety valve to guard against large documents.-
Methods inherited from class org.jesterj.ingest.model.impl.NamedBuilder
isValid
-
-
-
-
Method Detail
-
getObj
protected TikaProcessor getObj()
- Overrides:
getObjin classNamedBuilder<TikaProcessor>
-
named
public TikaProcessor.Builder named(java.lang.String name)
- Specified by:
namedin classNamedBuilder<TikaProcessor>
-
appendingSuffix
public TikaProcessor.Builder appendingSuffix(java.lang.String suffix)
Add a suffix to all fields names output by tika.- Parameters:
suffix- the suffix to add- Returns:
- This builder for further configuration
-
truncatingTextTo
public TikaProcessor.Builder truncatingTextTo(int chars)
Convenience override for safety valve to guard against large documents. By default this is set to -1 for no limit on the amount of data to process with Tika.- Parameters:
chars- The limit- Returns:
- This builder for further configuration
-
replacingRawData
public TikaProcessor.Builder replacingRawData(boolean replaceRaw)
Speifiy if the results of tika's analysis should replace the raw document content or not.- Parameters:
replaceRaw- if true the original content for the document will be overwritten by tika's extracted output.- Returns:
- This builder for further configuration
-
intoField
public TikaProcessor.Builder intoField(java.lang.String field)
Send the results of tika's text extraction (text extracted but not metadata) into the supplied field.- Parameters:
field- the name of the field to containt the extracted text.- Returns:
- This builder for further configuration
-
configuredWith
public TikaProcessor.Builder configuredWith(org.w3c.dom.Document config) throws org.apache.tika.exception.TikaException, java.io.IOException
Specify a tika configuration via an XML document you have loaded via filesystem/classpath or other method of your choice.- Parameters:
config- The configuration- Returns:
- This builder for further config
- Throws:
org.apache.tika.exception.TikaException- if Tika doesn't like your configjava.io.IOException- if Tika can't find something it needed?
-
build
public TikaProcessor build()
-
-