Package org.jesterj.ingest.model.impl
Class DocumentImpl
- java.lang.Object
-
- org.jesterj.ingest.model.impl.DocumentImpl
-
- All Implemented Interfaces:
java.io.Serializable,Document
public class DocumentImpl extends java.lang.Object implements Document
A container for the file data and associated metadata. MetaData for which the key and the value are of type @link(java.lang.String} should be submitted as a field & value to the index. Multiple values for the same field are supported and addition order is maintained. The file data will be discarded by default, and if it is to be indexed, it should be processed and the text result added as a string value by a step in a plan that processes this item.- See Also:
ForwardingListMultimap, Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.jesterj.ingest.model.Document
Document.Operation
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringCHILD_SEPstatic java.util.regex.PatternDEFAULT_TO_STRING-
Fields inherited from interface org.jesterj.ingest.model.Document
DOC_RAW_SIZE
-
-
Constructor Summary
Constructors Constructor Description DocumentImpl(byte[] rawData, java.lang.String id, Document.Operation oper, DocumentImpl parent)DocumentImpl(byte[] rawData, java.lang.String id, Plan plan, Document.Operation operation, Scanner source, java.lang.String origination)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanalreadyHasIncompleteStepList()java.util.Map<java.lang.String,java.util.Collection<java.lang.String>>asMap()booleancontainsEntry(java.lang.Object key, java.lang.Object value)booleancontainsKey(java.lang.Object key)booleancontainsValue(java.lang.Object value)java.lang.StringdumpStatus()Get a string representation of the current status information.java.util.Collection<java.util.Map.Entry<java.lang.String,java.lang.String>>entries()java.util.List<java.lang.String>get(java.lang.String key)com.google.common.collect.ListMultimap<java.lang.String,java.lang.String>getDelegate()java.lang.StringgetFirstValue(java.lang.String fieldName)java.lang.StringgetHash()A hash based on the contents of the delegate and the raw data.java.lang.StringgetId()Returns the identifier for this document.java.lang.StringgetIdField()java.lang.String[]getIncompleteOutputDestinations()Document.OperationgetOperation()java.lang.StringgetOrigination()Identify if the document originated from Fault tolerance or a scan.java.lang.StringgetOrignalParentId()java.lang.StringgetParentId()byte[]getRawData()Get the raw bytes from which this item was constructed.java.lang.StringgetSourceScannerName()StatusgetStatus(java.lang.String outputDestination)The current processing status of the document relative to a given destinagion.DocStatusChangegetStatusChange()Get the statuses that will be altered if reportDocStatus is invoked.java.lang.StringgetStatusMessage(java.lang.String outputDestination)Get a message relating to the processing status.voidinitDestinations(java.util.Set<java.lang.String> outputDestinationNames, java.lang.String scannerName)booleanisEmpty()booleanisForceReprocess()booleanisPlanOutput(java.lang.String stepName)booleanisStatusChanged()java.util.Set<java.lang.String>keySet()java.util.List<java.lang.String>listChangingDestinations()java.lang.StringlistIncompleteOutputSteps()booleanput(java.lang.String key, java.lang.String value)booleanputAll(java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)booleanremove(java.lang.Object key, java.lang.Object value)java.util.List<java.lang.String>removeAll(java.lang.Object key)voidremoveAllOtherDestinationsQuietly(java.util.Set<java.lang.String> outputDestinationNames)For use during routing, to remove destinations from document duplicates before distribution to downstream steps.voidremoveDownStreamOutputStep(Router router, java.lang.String name)Remove a downstream potent step.java.util.List<java.lang.String>replaceValues(java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)voidreportDocStatus()voidsetForceReprocess(boolean b)Ensures that this document will be fed into the plan regardless of memory or hashing settings.voidsetIncompleteOutputDestinations(java.util.Map<java.lang.String,DocDestinationStatus> value)voidsetRawData(byte[] rawData)Replace the raw bytes.voidsetStatus(Status status, java.lang.String message, java.io.Serializable... args)Set a status for the current down stream steps.voidsetStatusForDestinations(Status status, java.util.Collection<java.lang.String> destinations, java.lang.String statusMessage, java.io.Serializable... messageArgs)intsize()java.lang.StringtoString()java.util.Collection<java.lang.String>values()-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.jesterj.ingest.model.Document
getHashAlg
-
-
-
-
Field Detail
-
CHILD_SEP
public static final java.lang.String CHILD_SEP
- See Also:
- Constant Field Values
-
DEFAULT_TO_STRING
public static final java.util.regex.Pattern DEFAULT_TO_STRING
-
-
Constructor Detail
-
DocumentImpl
public DocumentImpl(byte[] rawData, java.lang.String id, Plan plan, Document.Operation operation, Scanner source, java.lang.String origination)
-
DocumentImpl
public DocumentImpl(byte[] rawData, java.lang.String id, Document.Operation oper, DocumentImpl parent)
-
-
Method Detail
-
putAll
public boolean putAll(@Nullable java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)
-
put
public boolean put(@Nonnull java.lang.String key, @Nonnull java.lang.String value)
-
containsEntry
public boolean containsEntry(@Nullable java.lang.Object key, @Nullable java.lang.Object value)- Specified by:
containsEntryin interfaceDocument
-
remove
public boolean remove(@Nullable java.lang.Object key, @Nullable java.lang.Object value)
-
containsValue
public boolean containsValue(@Nullable java.lang.Object value)- Specified by:
containsValuein interfaceDocument
-
entries
public java.util.Collection<java.util.Map.Entry<java.lang.String,java.lang.String>> entries()
-
asMap
public java.util.Map<java.lang.String,java.util.Collection<java.lang.String>> asMap()
-
replaceValues
public java.util.List<java.lang.String> replaceValues(@Nullable java.lang.String key, java.lang.Iterable<? extends java.lang.String> values)- Specified by:
replaceValuesin interfaceDocument
-
values
public java.util.Collection<java.lang.String> values()
-
containsKey
public boolean containsKey(@Nullable java.lang.Object key)- Specified by:
containsKeyin interfaceDocument
-
get
public java.util.List<java.lang.String> get(@Nullable java.lang.String key)
-
removeAll
public java.util.List<java.lang.String> removeAll(@Nullable java.lang.Object key)
-
getRawData
public byte[] getRawData()
Description copied from interface:DocumentGet the raw bytes from which this item was constructed. This is usually only used by the first or second step in the pipeline which converts the binary form into entries in this map.- Specified by:
getRawDatain interfaceDocument- Returns:
- the actual bytes of the document.
-
setRawData
public void setRawData(byte[] rawData)
Description copied from interface:DocumentReplace the raw bytes. This is only used when the originally indexed document is to be interpreted as a pointer to the "real" document, or when the Item is first constructed.- Specified by:
setRawDatain interfaceDocument- Parameters:
rawData- the actual bytes of the document
-
getStatus
public Status getStatus(java.lang.String outputDestination)
Description copied from interface:DocumentThe current processing status of the document relative to a given destinagion.
-
getStatusMessage
public java.lang.String getStatusMessage(java.lang.String outputDestination)
Description copied from interface:DocumentGet a message relating to the processing status. This will typically be used to print the name of The last successful processor, or the error message onto the item.- Specified by:
getStatusMessagein interfaceDocument- Parameters:
outputDestination- the output step for which we want to know the message.- Returns:
- A short message suitable for logging and debugging (not a stack trace)
-
setStatusForDestinations
public void setStatusForDestinations(Status status, java.util.Collection<java.lang.String> destinations, java.lang.String statusMessage, java.io.Serializable... messageArgs)
-
getDelegate
public com.google.common.collect.ListMultimap<java.lang.String,java.lang.String> getDelegate()
- Specified by:
getDelegatein interfaceDocument
-
getId
public java.lang.String getId()
Description copied from interface:DocumentReturns the identifier for this document. This should be identical to get(getIdField()).
-
getHash
public java.lang.String getHash()
Description copied from interface:DocumentA hash based on the contents of the delegate and the raw data.
-
getIdField
public java.lang.String getIdField()
- Specified by:
getIdFieldin interfaceDocument
-
getOperation
public Document.Operation getOperation()
- Specified by:
getOperationin interfaceDocument
-
getSourceScannerName
public java.lang.String getSourceScannerName()
- Specified by:
getSourceScannerNamein interfaceDocument
-
getFirstValue
public java.lang.String getFirstValue(java.lang.String fieldName)
- Specified by:
getFirstValuein interfaceDocument
-
getParentId
public java.lang.String getParentId()
- Specified by:
getParentIdin interfaceDocument
-
getOrignalParentId
public java.lang.String getOrignalParentId()
- Specified by:
getOrignalParentIdin interfaceDocument
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
isStatusChanged
public boolean isStatusChanged()
- Specified by:
isStatusChangedin interfaceDocument
-
reportDocStatus
public void reportDocStatus()
- Specified by:
reportDocStatusin interfaceDocument
-
initDestinations
public void initDestinations(java.util.Set<java.lang.String> outputDestinationNames, java.lang.String scannerName)
-
setForceReprocess
public void setForceReprocess(boolean b)
Description copied from interface:DocumentEnsures that this document will be fed into the plan regardless of memory or hashing settings. Has no effect after the document exits the scanner.- Specified by:
setForceReprocessin interfaceDocument- Parameters:
b- true if the document should ignore hashing and memory settings.
-
isForceReprocess
public boolean isForceReprocess()
- Specified by:
isForceReprocessin interfaceDocument
-
setIncompleteOutputDestinations
public void setIncompleteOutputDestinations(java.util.Map<java.lang.String,DocDestinationStatus> value)
- Specified by:
setIncompleteOutputDestinationsin interfaceDocument
-
alreadyHasIncompleteStepList
public boolean alreadyHasIncompleteStepList()
- Specified by:
alreadyHasIncompleteStepListin interfaceDocument
-
isPlanOutput
public boolean isPlanOutput(java.lang.String stepName)
- Specified by:
isPlanOutputin interfaceDocument
-
listIncompleteOutputSteps
public java.lang.String listIncompleteOutputSteps()
- Specified by:
listIncompleteOutputStepsin interfaceDocument
-
getStatusChange
public DocStatusChange getStatusChange()
Description copied from interface:DocumentGet the statuses that will be altered if reportDocStatus is invoked.- Specified by:
getStatusChangein interfaceDocument- Returns:
- a map of status changes.
-
listChangingDestinations
public java.util.List<java.lang.String> listChangingDestinations()
- Specified by:
listChangingDestinationsin interfaceDocument
-
getIncompleteOutputDestinations
public java.lang.String[] getIncompleteOutputDestinations()
- Specified by:
getIncompleteOutputDestinationsin interfaceDocument
-
setStatus
public void setStatus(Status status, java.lang.String message, java.io.Serializable... args)
Description copied from interface:DocumentSet a status for the current down stream steps. The status message may contain '{}' and additional arguments which will be substituted in the same manner as log4j logging messages. Since document objects must remain serializable, these arguments should typically be reduced to strings if they are not already serializable.WARNING: this method has no persistent effect until
Document.reportDocStatus()is called. If the system is killed (power cord, whatever) before reportStatus() is completed this status change will not be retained when JesterJ restarts.
-
removeDownStreamOutputStep
public void removeDownStreamOutputStep(Router router, java.lang.String name)
Description copied from interface:DocumentRemove a downstream potent step. This should only be performed by routers and step infrastructure, hence the router argument. If you found a way to call this from your processor, please report a bug in our issue tracker.- Specified by:
removeDownStreamOutputStepin interfaceDocument- Parameters:
router- The router for the step in which the removal takes place.name- the name of the step to remove
-
dumpStatus
public java.lang.String dumpStatus()
Description copied from interface:DocumentGet a string representation of the current status information. For debugging only.- Specified by:
dumpStatusin interfaceDocument- Returns:
- status information in string form.
-
getOrigination
public java.lang.String getOrigination()
Description copied from interface:DocumentIdentify if the document originated from Fault tolerance or a scan.- Specified by:
getOriginationin interfaceDocument- Returns:
- the source type for the document for debugging.
-
removeAllOtherDestinationsQuietly
public void removeAllOtherDestinationsQuietly(java.util.Set<java.lang.String> outputDestinationNames)
Description copied from interface:DocumentFor use during routing, to remove destinations from document duplicates before distribution to downstream steps. This is only necessary for routers that route to more than one step and therefore must clone documents. The clones need to be adjusted so that they do not have statuses for destinations not downstream from their immediate targets, without causing updates to the removed destinations (which might still be serviced by another clone). The returned destinations need to be aggregated and inspected to subsequently determine if the routing is in effect dropping some destinations For example a router that sends 2 copies to 2 out of 3 possible down stream steps has to only issue DROPPED status updates for the 3rd step that doesn't get a clone of the original document.- Specified by:
removeAllOtherDestinationsQuietlyin interfaceDocument- Parameters:
outputDestinationNames- the correct destinations at which this clone will be targeted
-
-