Class FieldTemplateProcessor

  • All Implemented Interfaces:
    Configurable, DocumentProcessor

    public class FieldTemplateProcessor
    extends java.lang.Object
    implements DocumentProcessor
    Interpret the value of a field as a velocity template using the document as context. If the field has multiple values all values will be interpreted and replaced. It's also important to remember that the fields being referenced can contain multiple values, so one usually wants to write $foobar[0], not $foobar. The latter will lead to replacement with [foo] if only one value or [foo,bar,baz] if 3 values are presently held in the field.

     

    WARNING: this uses the velocity templating engine which is a powerful, but potentially dangerous technique!! You want to ensure that the template is NOT derived from and does NOT CONTAIN any text that is provided by users or other untrustworthy sources before it is interpreted by this processor. If you allow user data to be interpreted as a template, you have given the user the ability to run ARBITRARY code on the ingestion infrastructure. Recommended usages include specifying the template field as a statically defined field, or drawn from a known controlled and curated database containing templates. Users are also strongly cautioned against chaining multiple instances of this step, since it becomes exponentially more difficult to ensure user controlled data is not added to the template and then subsequently interpreted. With great power comes great responsibility. Don't run with scissors... you have been warned!

    • Constructor Detail

      • FieldTemplateProcessor

        public FieldTemplateProcessor()
    • Method Detail

      • getName

        public java.lang.String getName()
        Description copied from interface: Configurable
        A name for this object to distinguish it from other objects. This value is generally supplied by the plan author. Every object in a plan must have a unique name, begin with a letter and only contain letters, digits, underscores and periods.
        Specified by:
        getName in interface Configurable
        Returns:
        The user supplied name for this step
      • processDocument

        public Document[] processDocument​(Document document)
        Description copied from interface: DocumentProcessor
        Mutate, validate or transmit a document. Implementations must not throw any * Throwable that is not a JVM Error and should be written expecting the possibility that the code might be interrupted at any point. Practically this means Document processors should perform no more than one persistent or externally visible actions and that action should be transactional. Large complex processors that write to disk, DB, or elsewhere multiple times run the risk of partial completion. Similarly, since JesterJ is a long-running system it will often cease operation due to unexpected outages (power cord, etc.), so it is not a good idea to hold resources that require an explicit release or "return". "Check then write" is of course a performance anti-pattern with respect to external networked or disk resources since network and disk io are typically slow to access. Processors should feel free to set the status of a document and add a status message via Document.setStatus(Status, String, java.io.Serializable...) however the easiest way to communicate a failure (for which all further processing is in error) is to simply throw a runtime exception. The document processor has no need to add the document to the next step in the plan as this will be handled by the infrastructure in StepImpl based on the status of the document so long as the document is emitted via the return value of this method. If the document enters via the parameters and is not emitted for any reason the processor MUST set an appropriate status before the end of this method, though it is preferable to just set the status and emit it.
        Specified by:
        processDocument in interface DocumentProcessor
        Parameters:
        document - the item to process
        Returns:
        The documents that result from the processing in this step. Documents with status of Status.PROCESSING will be processed by subsequent steps, and documents with any other status will have their status recorded and will not be processed by subsequent steps.
      • setName

        public void setName​(java.lang.String name)
      • setTemplateField

        public void setTemplateField​(java.lang.String templateField)