@TriggerSerially @InputRequirement(value=INPUT_FORBIDDEN) @Tags(value={"file","get","list","ingest","source","filesystem"}) @CapabilityDescription(value="Retrieves a listing of files from the input directory. For each file listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster when \'Input Directory Location\' is set to \'Remote\'. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all the data. When \'Input Directory Location\' is \'Local\', the \'Execution\' mode can be anything, and synchronization won\'t happen. Unlike GetFile, this Processor does not delete any data from the local filesystem.") @WritesAttribute(attribute="filename",description="The name of the file that was read from filesystem.") @WritesAttribute(attribute="path",description="The path is set to the relative path of the file\'s directory on filesystem compared to the Input Directory property. For example, if Input Directory is set to /tmp, then files picked up from /tmp will have the path attribute set to \"/\". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to \"abc/1/2/3/\".") @WritesAttribute(attribute="absolute.path",description="The absolute.path is set to the absolute path of the file\'s directory on filesystem. For example, if the Input Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to \"/tmp/\". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to \"/tmp/abc/1/2/3/\".") @WritesAttribute(attribute="file.owner",description="The user that owns the file in filesystem") @WritesAttribute(attribute="file.group",description="The group that owns the file in filesystem") @WritesAttribute(attribute="file.size",description="The number of bytes in the file in filesystem") @WritesAttribute(attribute="file.permissions",description="The permissions for the file in filesystem. This is formatted as 3 characters for the owner, 3 for the group, and 3 for other users. For example rw-rw-r--") @WritesAttribute(attribute="file.lastModifiedTime",description="The timestamp of when the file in filesystem was last modified as \'yyyy-MM-dd\'T\'HH:mm:ssZ\'") @WritesAttribute(attribute="file.lastAccessTime",description="The timestamp of when the file in filesystem was last accessed as \'yyyy-MM-dd\'T\'HH:mm:ssZ\'") @WritesAttribute(attribute="file.creationTime",description="The timestamp of when the file in filesystem was created as \'yyyy-MM-dd\'T\'HH:mm:ssZ\'") @SeeAlso(value={GetFile.class,PutFile.class,FetchFile.class}) @Stateful(scopes={LOCAL,CLUSTER}, description="After performing a listing of files, the timestamp of the newest file is stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run. Whether the state is stored with a Local or Cluster scope depends on the value of the <Input Directory Location> property.") @DefaultSchedule(strategy=TIMER_DRIVEN, period="1 min") public class ListFile extends AbstractListProcessor<FileInfo>
| Modifier and Type | Class and Description |
|---|---|
private static class |
ListFile.DiskOperation |
(package private) static class |
ListFile.MonitorActiveTasks |
(package private) static interface |
ListFile.OperationStatistics |
(package private) static interface |
ListFile.PerformanceTracker
PerformanceTracker is responsible for providing a mechanism by which any disk operation can be timed and the timing information
can both be used to issue warnings as well as be aggregated for some amount of time, in order to understand how long certain disk operations
take and which files may be responsible for causing longer-than-usual operations to be performed.
|
private static class |
ListFile.ProcessorStoppedException |
static class |
ListFile.RollingMetricPerformanceTracker
Tracks metrics using a rolling window of time, in which older metrics are 'aged off' by calling
ListFile.RollingMetricPerformanceTracker.purgeTimingInfo(long). |
private static class |
ListFile.StandardOperationStatistics |
private static class |
ListFile.TimedOperationKey |
private static class |
ListFile.TimingInfo
Provides a mechanism for timing how long a particular operation takes to complete, logging if it takes longer than the configured threshold.
|
static class |
ListFile.UntrackedPerformanceTracker
A PerformanceTracker that is capable of tracking which disk access operation is active and which directory is actively being listed,
as well as timing specific operations, but does not track metrics over any amount of time.
|
AbstractListProcessor.ListingModeBY_ENTITIES, BY_TIME_WINDOW, BY_TIMESTAMPS, DISTRIBUTED_CACHE_SERVICE, LISTING_LAG_MILLIS, LISTING_STRATEGY, NO_TRACKING, PRECISION_AUTO_DETECT, PRECISION_MILLIS, PRECISION_MINUTES, PRECISION_SECONDS, RECORD_WRITER, REL_SUCCESS, TARGET_SYSTEM_TIMESTAMP_PRECISION| Constructor and Description |
|---|
ListFile() |
createListedEntityTracker, customValidate, customValidate, getCurrentNanoTime, getCurrentTime, getDefaultTimePrecision, getKey, getPersistenceFile, initListedEntityTracker, listByNoTracking, listByTimeWindow, listByTrackingTimestamps, onPrimaryNodeChange, onPropertyModified, onTrigger, updateState, verifyonTriggergetControllerServiceLookup, getIdentifier, getLogger, getNodeTypeProvider, initialize, isConfigurationRestored, isScheduled, toString, updateConfiguredRestoredTrue, updateScheduledFalse, updateScheduledTrueequals, getPropertyDescriptor, getPropertyDescriptors, getSupportedDynamicPropertyDescriptor, hashCode, validateclone, finalize, getClass, notify, notifyAll, wait, wait, waitisStatefulgetPropertyDescriptor, getPropertyDescriptors, validatestatic final AllowableValue LOCATION_LOCAL
static final AllowableValue LOCATION_REMOTE
public static final PropertyDescriptor DIRECTORY
public static final PropertyDescriptor RECURSE
public static final PropertyDescriptor DIRECTORY_LOCATION
public static final PropertyDescriptor FILE_FILTER
public static final PropertyDescriptor PATH_FILTER
public static final PropertyDescriptor INCLUDE_FILE_ATTRIBUTES
public static final PropertyDescriptor MIN_AGE
public static final PropertyDescriptor MAX_AGE
public static final PropertyDescriptor MIN_SIZE
public static final PropertyDescriptor MAX_SIZE
public static final PropertyDescriptor IGNORE_HIDDEN_FILES
public static final PropertyDescriptor TRACK_PERFORMANCE
public static final PropertyDescriptor MAX_TRACKED_FILES
public static final PropertyDescriptor MAX_DISK_OPERATION_TIME
public static final PropertyDescriptor MAX_LISTING_TIME
private List<PropertyDescriptor> properties
private Set<Relationship> relationships
private volatile ScheduledExecutorService monitoringThreadPool
private volatile Future<?> monitoringFuture
private volatile boolean includeFileAttributes
private volatile ListFile.PerformanceTracker performanceTracker
private volatile long performanceLoggingTimestamp
public static final String FILE_CREATION_TIME_ATTRIBUTE
public static final String FILE_LAST_MODIFY_TIME_ATTRIBUTE
public static final String FILE_LAST_ACCESS_TIME_ATTRIBUTE
public static final String FILE_SIZE_ATTRIBUTE
public static final String FILE_OWNER_ATTRIBUTE
public static final String FILE_GROUP_ATTRIBUTE
public static final String FILE_PERMISSIONS_ATTRIBUTE
public static final String FILE_MODIFY_DATE_ATTR_FORMAT
protected void init(ProcessorInitializationContext context)
init in class AbstractSessionFactoryProcessorprotected List<PropertyDescriptor> getSupportedPropertyDescriptors()
getSupportedPropertyDescriptors in class AbstractConfigurableComponentpublic Set<Relationship> getRelationships()
getRelationships in interface ProcessorgetRelationships in class AbstractListProcessor<FileInfo>@OnScheduled public void onScheduled(ProcessContext context)
@OnStopped public void onStopped(ProcessContext context)
protected ListFile.PerformanceTracker getPerformanceTracker()
public void logPerformance()
protected Map<String,String> createAttributes(FileInfo fileInfo, ProcessContext context)
createAttributes in class AbstractListProcessor<FileInfo>protected String getPath(ProcessContext context)
getPath in class AbstractListProcessor<FileInfo>protected Scope getStateScope(PropertyContext context)
getStateScope in class AbstractListProcessor<FileInfo>protected RecordSchema getRecordSchema()
getRecordSchema in class AbstractListProcessor<FileInfo>protected Integer countUnfilteredListing(ProcessContext context) throws IOException
countUnfilteredListing in class AbstractListProcessor<FileInfo>IOExceptionprotected List<FileInfo> performListing(ProcessContext context, Long minTimestamp, AbstractListProcessor.ListingMode listingMode) throws IOException
performListing in class AbstractListProcessor<FileInfo>IOExceptionprivate List<FileInfo> performListing(ProcessContext context, Long minTimestamp, AbstractListProcessor.ListingMode listingMode, boolean applyFilters) throws IOException
IOExceptionprotected String getListingContainerName(ProcessContext context)
getListingContainerName in class AbstractListProcessor<FileInfo>protected boolean isListingResetNecessary(PropertyDescriptor property)
isListingResetNecessary in class AbstractListProcessor<FileInfo>private BiPredicate<Path,BasicFileAttributes> createFileFilter(ProcessContext context, ListFile.PerformanceTracker performanceTracker, boolean applyFilters, Path basePath)
Copyright © 2023 Apache NiFi Project. All rights reserved.