Package org.jesterj.ingest.scanners
Class SimpleFileScanner
- java.lang.Object
-
- org.jesterj.ingest.model.impl.StepImpl
-
- org.jesterj.ingest.model.impl.ScannerImpl
-
- org.jesterj.ingest.scanners.SimpleFileScanner
-
- All Implemented Interfaces:
java.lang.Iterable<Document>,java.lang.Runnable,java.util.Collection<Document>,java.util.concurrent.BlockingQueue<Document>,java.util.Queue<Document>,Active,Configurable,DeferredBuilding,Scanner,Step,FileScanner
public class SimpleFileScanner extends ScannerImpl implements FileScanner
Scanner for local filesystems. This scanner periodically does a full walk of the filesystem. No persistent record of files detected during walking is kept, and all files will be visited on each scan, so it is highly recommended to use this with the remembering option turned on unless a regular full re-index is desired. If walking the filesystem takes longer than the scan interval, the time to walk will determine the index latency instead. This scanner will not start a new scan until the current one completes. Files to be processed must fit in JVM memory.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSimpleFileScanner.Builder-
Nested classes/interfaces inherited from class org.jesterj.ingest.model.impl.ScannerImpl
ScannerImpl.ScanOp
-
-
Field Summary
-
Fields inherited from class org.jesterj.ingest.model.impl.ScannerImpl
activeScans, CREATE_DOC_HASH, CREATE_FT_KEYSPACE, CREATE_FT_TABLE, CREATE_INDEX_STATUS, DDL_TIMEOUT, DEF_MAX_ERROR_RETRY, FTI_ORIGIN, NEW_CONTENT_FOUND_MSG, SCAN_ORIGIN, TIMEOUT
-
Fields inherited from interface org.jesterj.ingest.model.Configurable
VALID_NAME
-
Fields inherited from interface org.jesterj.ingest.model.Step
JJ_PLAN_NAME, JJ_PLAN_VERSION
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedSimpleFileScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Optional<Document>fetchById(java.lang.String id, java.lang.String origination)Load a document based on the document's id.ScannerImpl.ScanOpgetScanOperation()The default scan operation is to check the cassandra database for records marked dirty or restart and process those records using the scanner's document fetching logic (empty by default)booleanisScanning()True if a new scan may be started.protected voidsetScanning(boolean scanning)-
Methods inherited from class org.jesterj.ingest.model.impl.ScannerImpl
activate, add, addAll, addPredecessor, clear, contains, containsAll, deactivate, docFound, drainTo, drainTo, element, getCassandra, getInterval, getLogger, isActivePriorSteps, isEmpty, isHashing, isHeuristicallyDirty, isRemembering, isScanActive, iterator, keySpace, offer, offer, peek, poll, poll, processDirty, processPendingDocs, put, remainingCapacity, remove, remove, removeAll, retainAll, run, scanFinished, scanStarted, sendToNext, setCassandra, setInterval, take, toArray, toArray
-
Methods inherited from class org.jesterj.ingest.model.impl.StepImpl
addDeferred, executeDeferred, forEach, getBatchSize, getDownstreamOutputSteps, getEligibleNextSteps, getName, getNextSteps, getNextSteps, getOutputDestinationNames, getPatternForStep, getPlan, getPriorSteps, getProcessor, getRouter, isActive, isOutputStep, parallelStream, removeIf, reportException, size, spliterator, stream, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.util.Collection
equals, hashCode, parallelStream, removeIf, size, spliterator, stream, toArray
-
Methods inherited from interface org.jesterj.ingest.model.Configurable
getName, isValidName
-
Methods inherited from interface org.jesterj.ingest.model.DeferredBuilding
addDeferred, executeDeferred
-
Methods inherited from interface org.jesterj.ingest.scanners.FileScanner
addAttrs
-
Methods inherited from interface org.jesterj.ingest.model.Scanner
getDocumentTracker, getIdFunction
-
Methods inherited from interface org.jesterj.ingest.model.Step
getBatchSize, getDownstreamOutputSteps, getEligibleNextSteps, getNextSteps, getNextSteps, getOutputDestinationNames, getPlan, getPriorSteps, getRouter, isOutputDestinationThisStep, isOutputStep
-
-
-
-
Method Detail
-
getScanOperation
public ScannerImpl.ScanOp getScanOperation()
Description copied from class:ScannerImplThe default scan operation is to check the cassandra database for records marked dirty or restart and process those records using the scanner's document fetching logic (empty by default)- Specified by:
getScanOperationin interfaceScanner- Specified by:
getScanOperationin classScannerImpl- Returns:
- a
Runnableobject that locates documents.
-
isScanning
public boolean isScanning()
Description copied from interface:ScannerTrue if a new scan may be started. Implementations may choose not to start a new scan until the old one has completed. This value is independent ofActive.isActive().- Specified by:
isScanningin interfaceScanner- Returns:
- true if a new scan should be started
-
fetchById
public java.util.Optional<Document> fetchById(java.lang.String id, java.lang.String origination)
Description copied from interface:ScannerLoad a document based on the document's id.- Specified by:
fetchByIdin interfaceScanner- Parameters:
id- the id of the document, see alsoDocument.getId()origination- A constant indicating the source (scanner or fti) for debugging- Returns:
- An optional that contains the document if it is possible to retrieve the document by ID
-
setScanning
protected void setScanning(boolean scanning)
-
-