Class DocumentImpl

  • All Implemented Interfaces:
    java.io.Serializable, Document

    public class DocumentImpl
    extends java.lang.Object
    implements Document
    A container for the file data and associated metadata. MetaData for which the key and the value are of type @link(java.lang.String} should be submitted as a field & value to the index. Multiple values for the same field are supported and addition order is maintained. The file data will be discarded by default, and if it is to be indexed, it should be processed and the text result added as a string value by a step in a plan that processes this item.
    See Also:
    ForwardingListMultimap, Serialized Form
    • Field Detail

      • DEFAULT_TO_STRING

        public static final java.util.regex.Pattern DEFAULT_TO_STRING
    • Constructor Detail

      • DocumentImpl

        public DocumentImpl​(byte[] rawData,
                            java.lang.String id,
                            Plan plan,
                            Document.Operation operation,
                            Scanner source,
                            java.lang.String origination)
    • Method Detail

      • putAll

        public boolean putAll​(@Nullable
                              java.lang.String key,
                              java.lang.Iterable<? extends java.lang.String> values)
        Specified by:
        putAll in interface Document
      • put

        public boolean put​(@Nonnull
                           java.lang.String key,
                           @Nonnull
                           java.lang.String value)
        Specified by:
        put in interface Document
      • keySet

        public java.util.Set<java.lang.String> keySet()
        Specified by:
        keySet in interface Document
      • containsEntry

        public boolean containsEntry​(@Nullable
                                     java.lang.Object key,
                                     @Nullable
                                     java.lang.Object value)
        Specified by:
        containsEntry in interface Document
      • remove

        public boolean remove​(@Nullable
                              java.lang.Object key,
                              @Nullable
                              java.lang.Object value)
        Specified by:
        remove in interface Document
      • containsValue

        public boolean containsValue​(@Nullable
                                     java.lang.Object value)
        Specified by:
        containsValue in interface Document
      • entries

        public java.util.Collection<java.util.Map.Entry<java.lang.String,​java.lang.String>> entries()
        Specified by:
        entries in interface Document
      • isEmpty

        public boolean isEmpty()
        Specified by:
        isEmpty in interface Document
      • asMap

        public java.util.Map<java.lang.String,​java.util.Collection<java.lang.String>> asMap()
        Specified by:
        asMap in interface Document
      • replaceValues

        public java.util.List<java.lang.String> replaceValues​(@Nullable
                                                              java.lang.String key,
                                                              java.lang.Iterable<? extends java.lang.String> values)
        Specified by:
        replaceValues in interface Document
      • values

        public java.util.Collection<java.lang.String> values()
        Specified by:
        values in interface Document
      • containsKey

        public boolean containsKey​(@Nullable
                                   java.lang.Object key)
        Specified by:
        containsKey in interface Document
      • get

        public java.util.List<java.lang.String> get​(@Nullable
                                                    java.lang.String key)
        Specified by:
        get in interface Document
      • size

        public int size()
        Specified by:
        size in interface Document
      • removeAll

        public java.util.List<java.lang.String> removeAll​(@Nullable
                                                          java.lang.Object key)
        Specified by:
        removeAll in interface Document
      • getRawData

        public byte[] getRawData()
        Description copied from interface: Document
        Get the raw bytes from which this item was constructed. This is usually only used by the first or second step in the pipeline which converts the binary form into entries in this map.
        Specified by:
        getRawData in interface Document
        Returns:
        the actual bytes of the document.
      • setRawData

        public void setRawData​(byte[] rawData)
        Description copied from interface: Document
        Replace the raw bytes. This is only used when the originally indexed document is to be interpreted as a pointer to the "real" document, or when the Item is first constructed.
        Specified by:
        setRawData in interface Document
        Parameters:
        rawData - the actual bytes of the document
      • getStatus

        public Status getStatus​(java.lang.String outputDestination)
        Description copied from interface: Document
        The current processing status of the document relative to a given destinagion.
        Specified by:
        getStatus in interface Document
        Parameters:
        outputDestination - A destination for which the status is to be reported.
        Returns:
        An enumeration value indicating whether the item is processing, errored out or complete.
      • getStatusMessage

        public java.lang.String getStatusMessage​(java.lang.String outputDestination)
        Description copied from interface: Document
        Get a message relating to the processing status. This will typically be used to print the name of The last successful processor, or the error message onto the item.
        Specified by:
        getStatusMessage in interface Document
        Parameters:
        outputDestination - the output step for which we want to know the message.
        Returns:
        A short message suitable for logging and debugging (not a stack trace)
      • setStatusForDestinations

        public void setStatusForDestinations​(Status status,
                                             java.util.Collection<java.lang.String> destinations,
                                             java.lang.String statusMessage,
                                             java.io.Serializable... messageArgs)
      • getDelegate

        public com.google.common.collect.ListMultimap<java.lang.String,​java.lang.String> getDelegate()
        Specified by:
        getDelegate in interface Document
      • getId

        public java.lang.String getId()
        Description copied from interface: Document
        Returns the identifier for this document. This should be identical to get(getIdField()).
        Specified by:
        getId in interface Document
        Returns:
        the id
      • getHash

        public java.lang.String getHash()
        Description copied from interface: Document
        A hash based on the contents of the delegate and the raw data.
        Specified by:
        getHash in interface Document
        Returns:
        a hex string md5 checksum
      • getIdField

        public java.lang.String getIdField()
        Specified by:
        getIdField in interface Document
      • getFirstValue

        public java.lang.String getFirstValue​(java.lang.String fieldName)
        Specified by:
        getFirstValue in interface Document
      • getParentId

        public java.lang.String getParentId()
        Specified by:
        getParentId in interface Document
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • initDestinations

        public void initDestinations​(java.util.Set<java.lang.String> outputDestinationNames,
                                     java.lang.String scannerName)
      • setForceReprocess

        public void setForceReprocess​(boolean b)
        Description copied from interface: Document
        Ensures that this document will be fed into the plan regardless of memory or hashing settings. Has no effect after the document exits the scanner.
        Specified by:
        setForceReprocess in interface Document
        Parameters:
        b - true if the document should ignore hashing and memory settings.
      • isPlanOutput

        public boolean isPlanOutput​(java.lang.String stepName)
        Specified by:
        isPlanOutput in interface Document
      • getStatusChange

        public DocStatusChange getStatusChange()
        Description copied from interface: Document
        Get the statuses that will be altered if reportDocStatus is invoked.
        Specified by:
        getStatusChange in interface Document
        Returns:
        a map of status changes.
      • setStatus

        public void setStatus​(Status status,
                              java.lang.String message,
                              java.io.Serializable... args)
        Description copied from interface: Document
        Set a status for the current down stream steps. The status message may contain '{}' and additional arguments which will be substituted in the same manner as log4j logging messages. Since document objects must remain serializable, these arguments should typically be reduced to strings if they are not already serializable.

         

        WARNING: this method has no persistent effect until Document.reportDocStatus() is called. If the system is killed (power cord, whatever) before reportStatus() is completed this status change will not be retained when JesterJ restarts.

        Specified by:
        setStatus in interface Document
        Parameters:
        status - The status to set for the destination step
        message - The user readable message explaining the status change
        args - values to be substituted into the message
      • removeDownStreamOutputStep

        public void removeDownStreamOutputStep​(Router router,
                                               java.lang.String name)
        Description copied from interface: Document
        Remove a downstream potent step. This should only be performed by routers and step infrastructure, hence the router argument. If you found a way to call this from your processor, please report a bug in our issue tracker.
        Specified by:
        removeDownStreamOutputStep in interface Document
        Parameters:
        router - The router for the step in which the removal takes place.
        name - the name of the step to remove
      • dumpStatus

        public java.lang.String dumpStatus()
        Description copied from interface: Document
        Get a string representation of the current status information. For debugging only.
        Specified by:
        dumpStatus in interface Document
        Returns:
        status information in string form.
      • getOrigination

        public java.lang.String getOrigination()
        Description copied from interface: Document
        Identify if the document originated from Fault tolerance or a scan.
        Specified by:
        getOrigination in interface Document
        Returns:
        the source type for the document for debugging.
      • removeAllOtherDestinationsQuietly

        public void removeAllOtherDestinationsQuietly​(java.util.Set<java.lang.String> outputDestinationNames)
        Description copied from interface: Document
        For use during routing, to remove destinations from document duplicates before distribution to downstream steps. This is only necessary for routers that route to more than one step and therefore must clone documents. The clones need to be adjusted so that they do not have statuses for destinations not downstream from their immediate targets, without causing updates to the removed destinations (which might still be serviced by another clone). The returned destinations need to be aggregated and inspected to subsequently determine if the routing is in effect dropping some destinations For example a router that sends 2 copies to 2 out of 3 possible down stream steps has to only issue DROPPED status updates for the 3rd step that doesn't get a clone of the original document.
        Specified by:
        removeAllOtherDestinationsQuietly in interface Document
        Parameters:
        outputDestinationNames - the correct destinations at which this clone will be targeted