public class ScanBuilderImpl extends Object implements ScanBuilder
ScanBuilder.| Constructor and Description |
|---|
ScanBuilderImpl(Path dataPath,
Protocol protocol,
Metadata metadata,
StructType snapshotSchema,
LogReplay logReplay,
Engine engine) |
| Modifier and Type | Method and Description |
|---|---|
Scan |
build() |
ScanBuilder |
withFilter(Engine engine,
Predicate predicate)
Apply the given filter expression to prune any files that do not possibly contain the data that
satisfies the given filter.
|
ScanBuilder |
withReadSchema(Engine engine,
StructType readSchema)
Apply the given readSchema.
|
public ScanBuilder withFilter(Engine engine, Predicate predicate)
ScanBuilderKernel makes use of the scan file partition values (for partitioned tables) and file-level column statistics (min, max, null count etc.) in the Delta metadata for filtering. Sometimes these metadata is not enough to deterministically say a scan file doesn't contain data that satisfies the filter.
E.g. given filter is a = 2. In file A, column a has min value as -40 and max
value as 200. In file B, column a has min value as 78 and max value as 323. File B can
be ruled out as it cannot possibly have rows where `a = 2`, but file A cannot be ruled out as
it may contain rows where a = 2.
As filtering is a best effort, the Scan object may return scan files (through Scan.getScanFiles(Engine)) that does not satisfy the filter. It is the responsibility of the
caller to apply the remaining filter returned by Scan.getRemainingFilter() to the data
read from the scan files (returned by Scan.getScanFiles(Engine)) to completely filter
out the data that doesn't satisfy the filter.```
withFilter in interface ScanBuilderengine - Engine instance to use in Delta Kernel.predicate - a Predicate to prune the metadata or data.ScanBuilder with filter applied.public ScanBuilder withReadSchema(Engine engine, StructType readSchema)
ScanBuilderwithReadSchema in interface ScanBuilderengine - Engine instance to use in Delta Kernel.readSchema - Subset of columns to read from the Delta table.ScanBuilder with projection pruning.public Scan build()
build in interface ScanBuilderinstance