Class DataSkippingUtils

Object
io.delta.kernel.internal.skipping.DataSkippingUtils

public class DataSkippingUtils extends Object
  • Constructor Details

    • DataSkippingUtils

      public DataSkippingUtils()
  • Method Details

    • parseJsonStats

      public static ColumnarBatch parseJsonStats(Engine engine, FilteredColumnarBatch scanFileBatch, StructType statsSchema)
      Given a FilteredColumnarBatch of scan files and the statistics schema to parse, return the parsed JSON stats from the scan files.
    • pruneStatsSchema

      public static StructType pruneStatsSchema(StructType schema, Set<Column> referencedLeafCols)
      Prunes the given schema to only include the referenced leaf columns. If a leaf column is a nested column it must be referenced using the full column path, e.g. "C_0.C_1.C_leaf"
      Parameters:
      schema - the schema to prune
      referencedLeafCols - set of leaf columns in schema
    • constructDataSkippingFilter

      public static Optional<DataSkippingPredicate> constructDataSkippingFilter(Predicate dataFilters, StructType dataSchema)
      Constructs a data skipping filter to prune files using column statistics given a query data filter if possible. The returned filter will evaluate to FALSE for any files that can be safely skipped. If the filter evaluates to NULL or TRUE, the file should not be skipped.
      Parameters:
      dataFilters - query filters on the data columns
      dataSchema - the data schema of the table
      Returns:
      data skipping filter to prune files if it exists as a DataSkippingPredicate