Class ScanBuilderImpl
- All Implemented Interfaces:
ScanBuilder
ScanBuilder.-
Constructor Summary
ConstructorsConstructorDescriptionScanBuilderImpl(Path dataPath, Protocol protocol, Metadata metadata, StructType snapshotSchema, LogReplay logReplay, Engine engine) -
Method Summary
Modifier and TypeMethodDescriptionbuild()withFilter(Engine engine, Predicate predicate) Apply the given filter expression to prune any files that do not possibly contain the data that satisfies the given filter.withReadSchema(Engine engine, StructType readSchema) Apply the given readSchema.
-
Constructor Details
-
ScanBuilderImpl
-
-
Method Details
-
withFilter
Description copied from interface:ScanBuilderApply the given filter expression to prune any files that do not possibly contain the data that satisfies the given filter.Kernel makes use of the scan file partition values (for partitioned tables) and file-level column statistics (min, max, null count etc.) in the Delta metadata for filtering. Sometimes these metadata is not enough to deterministically say a scan file doesn't contain data that satisfies the filter.
E.g. given filter is
a = 2. In file A, columnahas min value as -40 and max value as 200. In file B, columnahas min value as 78 and max value as 323. File B can be ruled out as it cannot possibly have rows where `a = 2`, but file A cannot be ruled out as it may contain rows wherea = 2.As filtering is a best effort, the
Scanobject may return scan files (throughScan.getScanFiles(Engine)) that does not satisfy the filter. It is the responsibility of the caller to apply the remaining filter returned byScan.getRemainingFilter()to the data read from the scan files (returned byScan.getScanFiles(Engine)) to completely filter out the data that doesn't satisfy the filter.```- Specified by:
withFilterin interfaceScanBuilder- Parameters:
engine-Engineinstance to use in Delta Kernel.predicate- aPredicateto prune the metadata or data.- Returns:
- A
ScanBuilderwith filter applied.
-
withReadSchema
Description copied from interface:ScanBuilderApply the given readSchema. If the builder already has a projection applied, calling this again replaces the existing projection.- Specified by:
withReadSchemain interfaceScanBuilder- Parameters:
engine-Engineinstance to use in Delta Kernel.readSchema- Subset of columns to read from the Delta table.- Returns:
- A
ScanBuilderwith projection pruning.
-
build
- Specified by:
buildin interfaceScanBuilder- Returns:
- Build the
instance
-