Interface Document

  • All Superinterfaces:
    java.io.Serializable
    All Known Implementing Classes:
    DocumentImpl

    public interface Document
    extends java.io.Serializable
    The publicly usable methods on a document. This interface is not meant to have multiple implementations. It mostly exists to document the methods that authors of processors should interact with. There are multiple places where this will get cast to DocumentImpl, but doing so inside a processor may be dangerous.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Interface Description
      static class  Document.Operation  
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String DOC_RAW_SIZE
      The 'file_size' field which holds the size of the original content for an input document as the framework first pulled it in.
    • Field Detail

      • DOC_RAW_SIZE

        static final java.lang.String DOC_RAW_SIZE
        The 'file_size' field which holds the size of the original content for an input document as the framework first pulled it in.
        See Also:
        Constant Field Values
    • Method Detail

      • keySet

        java.util.Set<java.lang.String> keySet()
      • containsEntry

        boolean containsEntry​(@Nullable
                              java.lang.Object key,
                              @Nullable
                              java.lang.Object value)
      • remove

        boolean remove​(@Nullable
                       java.lang.Object key,
                       @Nullable
                       java.lang.Object value)
      • containsValue

        boolean containsValue​(@Nullable
                              java.lang.Object value)
      • entries

        java.util.Collection<java.util.Map.Entry<java.lang.String,​java.lang.String>> entries()
      • isEmpty

        boolean isEmpty()
      • asMap

        java.util.Map<java.lang.String,​java.util.Collection<java.lang.String>> asMap()
      • replaceValues

        java.util.List<java.lang.String> replaceValues​(@Nullable
                                                       java.lang.String key,
                                                       java.lang.Iterable<? extends java.lang.String> values)
      • values

        java.util.Collection<java.lang.String> values()
      • containsKey

        boolean containsKey​(@Nullable
                            java.lang.Object key)
      • get

        java.util.List<java.lang.String> get​(@Nullable
                                             java.lang.String key)
      • size

        int size()
      • removeAll

        java.util.List<java.lang.String> removeAll​(@Nullable
                                                   java.lang.Object key)
      • putAll

        boolean putAll​(@Nullable
                       java.lang.String key,
                       java.lang.Iterable<? extends java.lang.String> values)
      • put

        boolean put​(java.lang.String field,
                    java.lang.String value)
      • getRawData

        byte[] getRawData()
        Get the raw bytes from which this item was constructed. This is usually only used by the first or second step in the pipeline which converts the binary form into entries in this map.
        Returns:
        the actual bytes of the document.
      • setRawData

        void setRawData​(byte[] rawData)
        Replace the raw bytes. This is only used when the originally indexed document is to be interpreted as a pointer to the "real" document, or when the Item is first constructed.
        Parameters:
        rawData - the actual bytes of the document
      • getStatus

        Status getStatus​(java.lang.String outputDestination)
        The current processing status of the document relative to a given destinagion.
        Parameters:
        outputDestination - A destination for which the status is to be reported.
        Returns:
        An enumeration value indicating whether the item is processing, errored out or complete.
        Throws:
        java.lang.IllegalStateException - if the plan has not been set.
      • getStatusMessage

        java.lang.String getStatusMessage​(java.lang.String outputStep)
        Get a message relating to the processing status. This will typically be used to print the name of The last successful processor, or the error message onto the item.
        Parameters:
        outputStep - the output step for which we want to know the message.
        Returns:
        A short message suitable for logging and debugging (not a stack trace)
      • getDelegate

        com.google.common.collect.ListMultimap<java.lang.String,​java.lang.String> getDelegate()
      • getId

        java.lang.String getId()
        Returns the identifier for this document. This should be identical to get(getIdField()).
        Returns:
        the id
      • getHash

        java.lang.String getHash()
        A hash based on the contents of the delegate and the raw data.
        Returns:
        a hex string md5 checksum
      • getHashAlg

        default java.lang.String getHashAlg()
      • getIdField

        java.lang.String getIdField()
      • getSourceScannerName

        java.lang.String getSourceScannerName()
      • getFirstValue

        java.lang.String getFirstValue​(java.lang.String fieldName)
      • getParentId

        java.lang.String getParentId()
      • getOrignalParentId

        java.lang.String getOrignalParentId()
      • isStatusChanged

        boolean isStatusChanged()
      • reportDocStatus

        void reportDocStatus()
      • setForceReprocess

        void setForceReprocess​(boolean b)
        Ensures that this document will be fed into the plan regardless of memory or hashing settings. Has no effect after the document exits the scanner.
        Parameters:
        b - true if the document should ignore hashing and memory settings.
      • isForceReprocess

        boolean isForceReprocess()
      • setIncompleteOutputDestinations

        void setIncompleteOutputDestinations​(java.util.Map<java.lang.String,​DocDestinationStatus> value)
      • alreadyHasIncompleteStepList

        boolean alreadyHasIncompleteStepList()
      • isPlanOutput

        boolean isPlanOutput​(java.lang.String stepName)
      • listIncompleteOutputSteps

        java.lang.String listIncompleteOutputSteps()
      • getStatusChange

        DocStatusChange getStatusChange()
        Get the statuses that will be altered if reportDocStatus is invoked.
        Returns:
        a map of status changes.
      • listChangingDestinations

        java.util.List<java.lang.String> listChangingDestinations()
      • getIncompleteOutputDestinations

        java.lang.String[] getIncompleteOutputDestinations()
      • setStatus

        void setStatus​(Status status,
                       java.lang.String message,
                       java.io.Serializable... args)
        Set a status for the current down stream steps. The status message may contain '{}' and additional arguments which will be substituted in the same manner as log4j logging messages. Since document objects must remain serializable, these arguments should typically be reduced to strings if they are not already serializable.

         

        WARNING: this method has no persistent effect until reportDocStatus() is called. If the system is killed (power cord, whatever) before reportStatus() is completed this status change will not be retained when JesterJ restarts.

        Parameters:
        status - The status to set for the destination step
        message - The user readable message explaining the status change
        args - values to be substituted into the message
      • removeDownStreamOutputStep

        void removeDownStreamOutputStep​(Router routerBase,
                                        java.lang.String name)
        Remove a downstream potent step. This should only be performed by routers and step infrastructure, hence the router argument. If you found a way to call this from your processor, please report a bug in our issue tracker.
        Parameters:
        routerBase - The router for the step in which the removal takes place.
        name - the name of the step to remove
      • dumpStatus

        java.lang.String dumpStatus()
        Get a string representation of the current status information. For debugging only.
        Returns:
        status information in string form.
      • getOrigination

        java.lang.String getOrigination()
        Identify if the document originated from Fault tolerance or a scan.
        Returns:
        the source type for the document for debugging.
      • removeAllOtherDestinationsQuietly

        void removeAllOtherDestinationsQuietly​(java.util.Set<java.lang.String> outputDestinationNames)
        For use during routing, to remove destinations from document duplicates before distribution to downstream steps. This is only necessary for routers that route to more than one step and therefore must clone documents. The clones need to be adjusted so that they do not have statuses for destinations not downstream from their immediate targets, without causing updates to the removed destinations (which might still be serviced by another clone). The returned destinations need to be aggregated and inspected to subsequently determine if the routing is in effect dropping some destinations For example a router that sends 2 copies to 2 out of 3 possible down stream steps has to only issue DROPPED status updates for the 3rd step that doesn't get a clone of the original document.
        Parameters:
        outputDestinationNames - the correct destinations at which this clone will be targeted