All Classes and Interfaces
Class
Description
Simple class for AutoDetectParser
Basic FileResourceConsumer that reads files from an input
directory and writes content to the output directory.
FileResourceConsumers should throw this if something
catastrophic has happened and the BatchProcess should shutdown
and not be restarted.
This is the main processor class for a single process.
Builds a BatchProcessor from a combination of runtime arguments and the
config file.
Reads configurable options from a config file and returns org.apache.commons.cli.Options
object to be used in commandline parser.
Simple interface around a collection of consumers that allows
for initializing and shutting shared resources (e.g.
Builds BasicContentHandler with type defined by attribute "basicHandlerType"
with possible values: xml, html, text, body, ignore.
Functionality and naming conventions (roughly) copied from org.apache.commons.lang3
so that we didn't have to add another dependency.
This is a basic interface to handle a logical "file".
This is a base class for file consumers.
Builds either an FSDirectoryCrawler or an FSListCrawler.
Selector that chooses files based on their file name
and their size, as determined by TikaCoreProperties.RESOURCE_NAME_KEY and Metadata.CONTENT_LENGTH.
FileSystem(FS)Resource wraps a file name.
Class that "crawls" a list of files.
Utility class to handle some common issues when
reading from and writing to a file system (FS).
stub interface to allow for different result types from different processors
Class that waits for input on System.in.
Builds an Interrupter
Same as
ObjectFromDOMAndQueueBuilder,
but this is for objects that require access to the shared queue.Interface for things that build objects from a DOM Node and a map of runtime attributes
Utility class to handle properties.
This runs a RecursiveParserWrapper against an input file
and outputs the json metadata to an output file.
Interface for reporter builders
Basic class to use for reporting status from both the crawler and the consumers.
Empty class for what a StatusReporter returns when it finishes.
Simple single-threaded class that calls tika-app against every file in a directory.
This uses the
JsonStreamingSerializer to write out a
single metadata object at a time.