Package org.jesterj.ingest.model
Interface Document
-
- All Superinterfaces:
java.io.Serializable
- All Known Implementing Classes:
DocumentImpl
public interface Document extends java.io.SerializableThe publicly usable methods on a document. This interface is not meant to have multiple implementations. It mostly exists to document the methods that authors of processors should interact with. There are multiple places where this will get cast to DocumentImpl, but doing so inside a processor may be dangerous.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static classDocument.Operation
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringDOC_RAW_SIZEThe 'file_size' field which holds the size of the original content for an input document as the framework first pulled it in.
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description booleanalreadyHasIncompleteStepList()java.util.Map<java.lang.String,java.util.Collection<java.lang.String>>asMap()booleancontainsEntry(java.lang.Object key, java.lang.Object value)booleancontainsKey(java.lang.Object key)booleancontainsValue(java.lang.Object value)java.lang.StringdumpStatus()Get a string representation of the current status information.java.util.Collection<java.util.Map.Entry<java.lang.String,java.lang.String>>entries()java.util.List<java.lang.String>get(java.lang.String key)com.google.common.collect.ListMultimap<java.lang.String,java.lang.String>getDelegate()java.lang.StringgetFirstValue(java.lang.String fieldName)java.lang.StringgetHash()A hash based on the contents of the delegate and the raw data.default java.lang.StringgetHashAlg()java.lang.StringgetId()Returns the identifier for this document.java.lang.StringgetIdField()java.lang.String[]getIncompleteOutputDestinations()Document.OperationgetOperation()java.lang.StringgetOrigination()Identify if the document originated from Fault tolerance or a scan.java.lang.StringgetOrignalParentId()java.lang.StringgetParentId()byte[]getRawData()Get the raw bytes from which this item was constructed.java.lang.StringgetSourceScannerName()StatusgetStatus(java.lang.String outputDestination)The current processing status of the document relative to a given destinagion.DocStatusChangegetStatusChange()Get the statuses that will be altered if reportDocStatus is invoked.java.lang.StringgetStatusMessage(java.lang.String outputStep)Get a message relating to the processing status.booleanisEmpty()booleanisForceReprocess()booleanisPlanOutput(java.lang.String stepName)booleanisStatusChanged()java.util.Set<java.lang.String>keySet()java.util.List<java.lang.String>listChangingDestinations()java.lang.StringlistIncompleteOutputSteps()booleanput(java.lang.String field, java.lang.String value)booleanputAll(java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)booleanremove(java.lang.Object key, java.lang.Object value)java.util.List<java.lang.String>removeAll(java.lang.Object key)voidremoveAllOtherDestinationsQuietly(java.util.Set<java.lang.String> outputDestinationNames)For use during routing, to remove destinations from document duplicates before distribution to downstream steps.voidremoveDownStreamOutputStep(Router routerBase, java.lang.String name)Remove a downstream potent step.java.util.List<java.lang.String>replaceValues(java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)voidreportDocStatus()voidsetForceReprocess(boolean b)Ensures that this document will be fed into the plan regardless of memory or hashing settings.voidsetIncompleteOutputDestinations(java.util.Map<java.lang.String,DocDestinationStatus> value)voidsetRawData(byte[] rawData)Replace the raw bytes.voidsetStatus(Status status, java.lang.String message, java.io.Serializable... args)Set a status for the current down stream steps.intsize()java.util.Collection<java.lang.String>values()
-
-
-
Field Detail
-
DOC_RAW_SIZE
static final java.lang.String DOC_RAW_SIZE
The 'file_size' field which holds the size of the original content for an input document as the framework first pulled it in.- See Also:
- Constant Field Values
-
-
Method Detail
-
keySet
java.util.Set<java.lang.String> keySet()
-
containsEntry
boolean containsEntry(@Nullable java.lang.Object key, @Nullable java.lang.Object value)
-
remove
boolean remove(@Nullable java.lang.Object key, @Nullable java.lang.Object value)
-
containsValue
boolean containsValue(@Nullable java.lang.Object value)
-
entries
java.util.Collection<java.util.Map.Entry<java.lang.String,java.lang.String>> entries()
-
isEmpty
boolean isEmpty()
-
asMap
java.util.Map<java.lang.String,java.util.Collection<java.lang.String>> asMap()
-
replaceValues
java.util.List<java.lang.String> replaceValues(@Nullable java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)
-
values
java.util.Collection<java.lang.String> values()
-
containsKey
boolean containsKey(@Nullable java.lang.Object key)
-
get
java.util.List<java.lang.String> get(@Nullable java.lang.String key)
-
size
int size()
-
removeAll
java.util.List<java.lang.String> removeAll(@Nullable java.lang.Object key)
-
putAll
boolean putAll(@Nullable java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)
-
put
boolean put(java.lang.String field, java.lang.String value)
-
getRawData
byte[] getRawData()
Get the raw bytes from which this item was constructed. This is usually only used by the first or second step in the pipeline which converts the binary form into entries in this map.- Returns:
- the actual bytes of the document.
-
setRawData
void setRawData(byte[] rawData)
Replace the raw bytes. This is only used when the originally indexed document is to be interpreted as a pointer to the "real" document, or when the Item is first constructed.- Parameters:
rawData- the actual bytes of the document
-
getStatus
Status getStatus(java.lang.String outputDestination)
The current processing status of the document relative to a given destinagion.- Parameters:
outputDestination- A destination for which the status is to be reported.- Returns:
- An enumeration value indicating whether the item is processing, errored out or complete.
- Throws:
java.lang.IllegalStateException- if the plan has not been set.
-
getStatusMessage
java.lang.String getStatusMessage(java.lang.String outputStep)
Get a message relating to the processing status. This will typically be used to print the name of The last successful processor, or the error message onto the item.- Parameters:
outputStep- the output step for which we want to know the message.- Returns:
- A short message suitable for logging and debugging (not a stack trace)
-
getDelegate
com.google.common.collect.ListMultimap<java.lang.String,java.lang.String> getDelegate()
-
getId
java.lang.String getId()
Returns the identifier for this document. This should be identical to get(getIdField()).- Returns:
- the id
-
getHash
java.lang.String getHash()
A hash based on the contents of the delegate and the raw data.- Returns:
- a hex string md5 checksum
-
getHashAlg
default java.lang.String getHashAlg()
-
getIdField
java.lang.String getIdField()
-
getOperation
Document.Operation getOperation()
-
getSourceScannerName
java.lang.String getSourceScannerName()
-
getFirstValue
java.lang.String getFirstValue(java.lang.String fieldName)
-
getParentId
java.lang.String getParentId()
-
getOrignalParentId
java.lang.String getOrignalParentId()
-
isStatusChanged
boolean isStatusChanged()
-
reportDocStatus
void reportDocStatus()
-
setForceReprocess
void setForceReprocess(boolean b)
Ensures that this document will be fed into the plan regardless of memory or hashing settings. Has no effect after the document exits the scanner.- Parameters:
b- true if the document should ignore hashing and memory settings.
-
isForceReprocess
boolean isForceReprocess()
-
setIncompleteOutputDestinations
void setIncompleteOutputDestinations(java.util.Map<java.lang.String,DocDestinationStatus> value)
-
alreadyHasIncompleteStepList
boolean alreadyHasIncompleteStepList()
-
isPlanOutput
boolean isPlanOutput(java.lang.String stepName)
-
listIncompleteOutputSteps
java.lang.String listIncompleteOutputSteps()
-
getStatusChange
DocStatusChange getStatusChange()
Get the statuses that will be altered if reportDocStatus is invoked.- Returns:
- a map of status changes.
-
listChangingDestinations
java.util.List<java.lang.String> listChangingDestinations()
-
getIncompleteOutputDestinations
java.lang.String[] getIncompleteOutputDestinations()
-
setStatus
void setStatus(Status status, java.lang.String message, java.io.Serializable... args)
Set a status for the current down stream steps. The status message may contain '{}' and additional arguments which will be substituted in the same manner as log4j logging messages. Since document objects must remain serializable, these arguments should typically be reduced to strings if they are not already serializable.WARNING: this method has no persistent effect until
reportDocStatus()is called. If the system is killed (power cord, whatever) before reportStatus() is completed this status change will not be retained when JesterJ restarts.- Parameters:
status- The status to set for the destination stepmessage- The user readable message explaining the status changeargs- values to be substituted into the message
-
removeDownStreamOutputStep
void removeDownStreamOutputStep(Router routerBase, java.lang.String name)
Remove a downstream potent step. This should only be performed by routers and step infrastructure, hence the router argument. If you found a way to call this from your processor, please report a bug in our issue tracker.- Parameters:
routerBase- The router for the step in which the removal takes place.name- the name of the step to remove
-
dumpStatus
java.lang.String dumpStatus()
Get a string representation of the current status information. For debugging only.- Returns:
- status information in string form.
-
getOrigination
java.lang.String getOrigination()
Identify if the document originated from Fault tolerance or a scan.- Returns:
- the source type for the document for debugging.
-
removeAllOtherDestinationsQuietly
void removeAllOtherDestinationsQuietly(java.util.Set<java.lang.String> outputDestinationNames)
For use during routing, to remove destinations from document duplicates before distribution to downstream steps. This is only necessary for routers that route to more than one step and therefore must clone documents. The clones need to be adjusted so that they do not have statuses for destinations not downstream from their immediate targets, without causing updates to the removed destinations (which might still be serviced by another clone). The returned destinations need to be aggregated and inspected to subsequently determine if the routing is in effect dropping some destinations For example a router that sends 2 copies to 2 out of 3 possible down stream steps has to only issue DROPPED status updates for the 3rd step that doesn't get a clone of the original document.- Parameters:
outputDestinationNames- the correct destinations at which this clone will be targeted
-
-