@TriggerSerially @Stateful(scopes={LOCAL,CLUSTER}, description="After a listing of resources is performed, the latest timestamp of any of the resources is stored in the component\'s state. The scope used depends on the implementation.") public abstract class AbstractListProcessor<T extends ListableEntity> extends AbstractProcessor
An Abstract Processor that is intended to simplify the coding required in order to perform Listing operations of remote or local resources. Those resources may be files, "objects", "messages", or any other sort of entity that may need to be listed in such a way that we identity the entity only once. Each of these objects, messages, etc. is referred to as an "entity" for the scope of this Processor.
This class is responsible for triggering the listing to occur, filtering the results returned such that only new (unlisted) entities or entities that have been modified will be emitted from the Processor.
In order to make use of this abstract class, the entities listed must meet the following criteria:
This class persists state across restarts so that even if NiFi is restarted, duplicates will not be pulled from the target system given the above criteria. This is
performed using the StateManager. This allows the system to be restarted and begin processing where it left off. The state that is stored is the latest timestamp
that has been pulled (as determined by the timestamps of the entities that are returned). See the section above for information about how this information isused in order to
determine new entities.
NOTE: This processor performs migrations of legacy state mechanisms inclusive of locally stored, file-based state and the optional utilization of the Distributed Cache
Service property to the new StateManager functionality. Upon successful migration, the associated data from one or both of the legacy mechanisms is purged.
For each new entity that is listed, the Processor will send a FlowFile to the 'success' relationship. The FlowFile will have no content but will have some set of attributes (defined by the concrete implementation) that can be used to fetch those resources or interact with them in whatever way makes sense for the configured dataflow.
Subclasses are responsible for the following:
performListing(ProcessContext, Long) method, which creates a listing of all
entities on the target system that have timestamps later than the provided timestamp. If the entities returned have a timestamp before the provided one, those
entities will be filtered out. It is therefore not necessary to perform the filtering of timestamps but is provided in order to give the implementation the ability
to filter those resources on the server side rather than pulling back all of the information, if it makes sense to do so in the concrete implementation.
createAttributes(ListableEntity, ProcessContext).
getPath(ProcessContext) method is responsible for returning the path that is currently being polled for entities. If this does concept
does not apply for the concrete implementation, it is recommended that the concrete implementation return "." or "/" for all invocations of this method.
isListingResetNecessary(PropertyDescriptor) method is responsible for determining when the listing needs to be reset by returning
a boolean indicating whether or not a change in the value of the provided property should trigger the timestamp and identifier information to be cleared.
| Modifier and Type | Class and Description |
|---|---|
private static class |
AbstractListProcessor.StringSerDe |
| Constructor and Description |
|---|
AbstractListProcessor() |
| Modifier and Type | Method and Description |
|---|---|
protected abstract Map<String,String> |
createAttributes(T entity,
ProcessContext context)
Creates a Map of attributes that should be applied to the FlowFile to represent this entity.
|
protected ListedEntityTracker<T> |
createListedEntityTracker() |
protected Collection<ValidationResult> |
customValidate(ValidationContext context)
In order to add custom validation at sub-classes, implement
customValidate(ValidationContext, Collection) method. |
protected void |
customValidate(ValidationContext validationContext,
Collection<ValidationResult> validationResults)
Sub-classes can add custom validation by implementing this method.
|
private EntityListing |
deserialize(String serializedState) |
protected String |
getDefaultTimePrecision()
This method is intended to be overridden by SubClasses those do not support TARGET_SYSTEM_TIMESTAMP_PRECISION property.
|
protected String |
getKey(String directory) |
protected abstract String |
getPath(ProcessContext context)
Returns the path to perform a listing on.
|
File |
getPersistenceFile() |
Set<Relationship> |
getRelationships() |
protected abstract Scope |
getStateScope(PropertyContext context)
Returns a Scope that specifies where the state should be managed for this Processor
|
void |
initListedEntityTracker(ProcessContext context) |
protected abstract boolean |
isListingResetNecessary(PropertyDescriptor property)
Determines whether or not the listing must be reset if the value of the given property is changed
|
private void |
listByTrackingEntities(ProcessContext context,
ProcessSession session) |
void |
listByTrackingTimestamps(ProcessContext context,
ProcessSession session) |
private void |
migrateState(String path,
DistributedMapCacheClient client,
StateManager stateManager,
Scope scope)
This processor used to use the DistributedMapCacheClient in order to store cluster-wide state, before the introduction of
the StateManager.
|
void |
onPrimaryNodeChange(PrimaryNodeState newState) |
void |
onPropertyModified(PropertyDescriptor descriptor,
String oldValue,
String newValue) |
void |
onTrigger(ProcessContext context,
ProcessSession session) |
protected abstract List<T> |
performListing(ProcessContext context,
Long minTimestamp)
Performs a listing of the remote entities that can be pulled.
|
private void |
persist(long latestListedEntryTimestampThisCycleMillis,
long lastProcessedLatestEntryTimestampMillis,
List<String> processedIdentifiesWithLatestTimestamp,
StateManager stateManager,
Scope scope) |
private void |
resetTimeStates() |
void |
updateState(ProcessContext context) |
onTriggergetControllerServiceLookup, getIdentifier, getLogger, getNodeTypeProvider, init, initialize, isConfigurationRestored, isScheduled, toString, updateConfiguredRestoredTrue, updateScheduledFalse, updateScheduledTrueequals, getPropertyDescriptor, getPropertyDescriptors, getSupportedDynamicPropertyDescriptor, getSupportedPropertyDescriptors, hashCode, validateclone, finalize, getClass, notify, notifyAll, wait, wait, waitgetPropertyDescriptor, getPropertyDescriptors, validatepublic static final PropertyDescriptor DISTRIBUTED_CACHE_SERVICE
public static final AllowableValue PRECISION_AUTO_DETECT
public static final AllowableValue PRECISION_MILLIS
public static final AllowableValue PRECISION_SECONDS
public static final AllowableValue PRECISION_MINUTES
public static final PropertyDescriptor TARGET_SYSTEM_TIMESTAMP_PRECISION
public static final Relationship REL_SUCCESS
public static final AllowableValue BY_TIMESTAMPS
public static final AllowableValue BY_ENTITIES
public static final PropertyDescriptor LISTING_STRATEGY
private volatile Long lastListedLatestEntryTimestampMillis
private volatile Long lastProcessedLatestEntryTimestampMillis
private volatile Long lastRunTimeNanos
private volatile boolean justElectedPrimaryNode
private volatile boolean resetState
private volatile boolean resetEntityTrackingState
private volatile ListedEntityTracker<T extends ListableEntity> listedEntityTracker
static final String LATEST_LISTED_ENTRY_TIMESTAMP_KEY
static final String LAST_PROCESSED_LATEST_ENTRY_TIMESTAMP_KEY
static final String IDENTIFIER_PREFIX
public File getPersistenceFile()
public void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue)
onPropertyModified in interface ConfigurableComponentonPropertyModified in class AbstractConfigurableComponentpublic Set<Relationship> getRelationships()
getRelationships in interface ProcessorgetRelationships in class AbstractSessionFactoryProcessorprotected final Collection<ValidationResult> customValidate(ValidationContext context)
customValidate(ValidationContext, Collection) method.customValidate in class AbstractConfigurableComponentprotected void customValidate(ValidationContext validationContext, Collection<ValidationResult> validationResults)
validationContext - the validation contextvalidationResults - add custom validation result to this collection@OnPrimaryNodeStateChange public void onPrimaryNodeChange(PrimaryNodeState newState)
@OnScheduled public final void updateState(ProcessContext context) throws IOException
IOExceptionprivate void migrateState(String path, DistributedMapCacheClient client, StateManager stateManager, Scope scope) throws IOException
path - the path to migrate state forclient - the DistributedMapCacheClient that is capable of obtaining the current statestateManager - the StateManager to use in order to store the new statescope - the scope to useIOException - if unable to retrieve or store the stateprivate void persist(long latestListedEntryTimestampThisCycleMillis,
long lastProcessedLatestEntryTimestampMillis,
List<String> processedIdentifiesWithLatestTimestamp,
StateManager stateManager,
Scope scope)
throws IOException
IOExceptionprivate EntityListing deserialize(String serializedState) throws com.fasterxml.jackson.core.JsonParseException, com.fasterxml.jackson.databind.JsonMappingException, IOException
com.fasterxml.jackson.core.JsonParseExceptioncom.fasterxml.jackson.databind.JsonMappingExceptionIOExceptionpublic void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException
onTrigger in class AbstractProcessorProcessExceptionpublic void listByTrackingTimestamps(ProcessContext context, ProcessSession session) throws ProcessException
ProcessExceptionprotected String getDefaultTimePrecision()
private void resetTimeStates()
protected abstract Map<String,String> createAttributes(T entity, ProcessContext context)
entity - the entity represented by the FlowFilecontext - the ProcessContext for obtaining configuration informationprotected abstract String getPath(ProcessContext context)
context - the ProcessContex to use in order to obtain configurationnull if not applicable.protected abstract List<T> performListing(ProcessContext context, Long minTimestamp) throws IOException
context - the ProcessContex to use in order to pull the appropriate entitiesminTimestamp - the minimum timestamp of entities that should be returned.IOExceptionprotected abstract boolean isListingResetNecessary(PropertyDescriptor property)
property - the property that has changedtrue if a change in value of the given property necessitates that the listing be reset, false otherwise.protected abstract Scope getStateScope(PropertyContext context)
context - the ProcessContext to use in order to make a determination@OnScheduled public void initListedEntityTracker(ProcessContext context)
protected ListedEntityTracker<T> createListedEntityTracker()
private void listByTrackingEntities(ProcessContext context, ProcessSession session) throws ProcessException
ProcessExceptionCopyright © 2019 Apache NiFi Project. All rights reserved.