All Classes and Interfaces

Class
Description
 
 
Simple class for AutoDetectParser
Basic FileResourceConsumer that reads files from an input directory and writes content to the output directory.
 
FileResourceConsumers should throw this if something catastrophic has happened and the BatchProcess should shutdown and not be restarted.
This is the main processor class for a single process.
 
Builds a BatchProcessor from a combination of runtime arguments and the config file.
 
 
Reads configurable options from a config file and returns org.apache.commons.cli.Options object to be used in commandline parser.
Simple interface around a collection of consumers that allows for initializing and shutting shared resources (e.g.
Builds BasicContentHandler with type defined by attribute "basicHandlerType" with possible values: xml, html, text, body, ignore.
Functionality and naming conventions (roughly) copied from org.apache.commons.lang3 so that we didn't have to add another dependency.
This is a basic interface to handle a logical "file".
This is a base class for file consumers.
 
 
 
Builds either an FSDirectoryCrawler or an FSListCrawler.
 
 
Selector that chooses files based on their file name and their size, as determined by TikaCoreProperties.RESOURCE_NAME_KEY and Metadata.CONTENT_LENGTH.
FileSystem(FS)Resource wraps a file name.
Class that "crawls" a list of files.
 
 
 
Utility class to handle some common issues when reading from and writing to a file system (FS).
 
 
 
stub interface to allow for different result types from different processors
Class that waits for input on System.in.
Builds an Interrupter
 
 
Same as ObjectFromDOMAndQueueBuilder, but this is for objects that require access to the shared queue.
Interface for things that build objects from a DOM Node and a map of runtime attributes
 
 
 
 
Utility class to handle properties.
This runs a RecursiveParserWrapper against an input file and outputs the json metadata to an output file.
Interface for reporter builders
 
Basic class to use for reporting status from both the crawler and the consumers.
 
Empty class for what a StatusReporter returns when it finishes.
Simple single-threaded class that calls tika-app against every file in a directory.
This uses the JsonStreamingSerializer to write out a single metadata object at a time.