Class StatsSchemaHelper

Object
io.delta.kernel.internal.skipping.StatsSchemaHelper

public class StatsSchemaHelper extends Object
Provides information and utilities for statistics columns given a table schema. Specifically, it is used to:
  1. Get the expected statistics schema given a table schema
  2. Check if a Literal or Column is skipping eligible
  3. Get the statistics column for a given stat type and logical column
  • Constructor Details

    • StatsSchemaHelper

      public StatsSchemaHelper(StructType dataSchema)
  • Method Details

    • isSkippingEligibleLiteral

      public static boolean isSkippingEligibleLiteral(Literal literal)
      Returns true if the given literal is skipping-eligible. Delta tracks min/max stats for a limited set of data types and only literals of those types are skipping eligible.
    • getStatsSchema

      public static StructType getStatsSchema(StructType dataSchema)
      Returns the expected statistics schema given a table schema. Here is an example of a data schema along with the schema of the statistics that would be collected. Data Schema: {{{ |-- a: struct (nullable = true) | |-- b: struct (nullable = true) | | |-- c: long (nullable = true) }}} Collected Statistics: {{{ |-- stats: struct (nullable = true) | |-- numRecords: long (nullable = false) | |-- minValues: struct (nullable = false) | | |-- a: struct (nullable = false) | | | |-- b: struct (nullable = false) | | | | |-- c: long (nullable = true) | |-- maxValues: struct (nullable = false) | | |-- a: struct (nullable = false) | | | |-- b: struct (nullable = false) | | | | |-- c: long (nullable = true) | |-- nullCount: struct (nullable = false) | | |-- a: struct (nullable = false) | | | |-- b: struct (nullable = false) | | | | |-- c: long (nullable = true) }}}
    • getMinColumn

      public Column getMinColumn(Column column)
      Given a logical column in the data schema provided when creating this, return the corresponding MIN column in the statistic schema that stores the MIN values for the provided logical column.
    • getMaxColumn

      public Column getMaxColumn(Column column)
      Given a logical column in the data schema provided when creating this, return the corresponding MAX column in the statistic schema that stores the MAX values for the provided logical column.
    • getNullCountColumn

      public Column getNullCountColumn(Column column)
      Given a logical column in the data schema provided when creating this, return the corresponding NULL_COUNT column in the statistic schema that stores the null count values for the provided logical column.
    • getNumRecordsColumn

      public Column getNumRecordsColumn()
      Returns the NUM_RECORDS column in the statistic schema
    • isSkippingEligibleMinMaxColumn

      public boolean isSkippingEligibleMinMaxColumn(Column column)
      Returns true if the given column is skipping-eligible using min/max statistics. This means the column exists, is a leaf column, and is of a skipping-eligible data-type.
    • isSkippingEligibleNullCountColumn

      public boolean isSkippingEligibleNullCountColumn(Column column)
      Returns true if the given column is skipping-eligible using null count statistics. This means the column exists and is a leaf column as we only collect stats for leaf columns.