Interface ScanBuilder

All Known Implementing Classes:
ScanBuilderImpl

@Evolving public interface ScanBuilder
Builder to construct Scan object.
Since:
3.0.0
  • Method Details

    • withFilter

      ScanBuilder withFilter(Engine engine, Predicate predicate)
      Apply the given filter expression to prune any files that do not possibly contain the data that satisfies the given filter.

      Kernel makes use of the scan file partition values (for partitioned tables) and file-level column statistics (min, max, null count etc.) in the Delta metadata for filtering. Sometimes these metadata is not enough to deterministically say a scan file doesn't contain data that satisfies the filter.

      E.g. given filter is a = 2. In file A, column a has min value as -40 and max value as 200. In file B, column a has min value as 78 and max value as 323. File B can be ruled out as it cannot possibly have rows where `a = 2`, but file A cannot be ruled out as it may contain rows where a = 2.

      As filtering is a best effort, the Scan object may return scan files (through Scan.getScanFiles(Engine)) that does not satisfy the filter. It is the responsibility of the caller to apply the remaining filter returned by Scan.getRemainingFilter() to the data read from the scan files (returned by Scan.getScanFiles(Engine)) to completely filter out the data that doesn't satisfy the filter.```

      Parameters:
      engine - Engine instance to use in Delta Kernel.
      predicate - a Predicate to prune the metadata or data.
      Returns:
      A ScanBuilder with filter applied.
    • withReadSchema

      ScanBuilder withReadSchema(Engine engine, StructType readSchema)
      Apply the given readSchema. If the builder already has a projection applied, calling this again replaces the existing projection.
      Parameters:
      engine - Engine instance to use in Delta Kernel.
      readSchema - Subset of columns to read from the Delta table.
      Returns:
      A ScanBuilder with projection pruning.
    • build

      Scan build()
      Returns:
      Build the instance