public class StatsSchemaHelper
extends Object
| Constructor and Description |
|---|
StatsSchemaHelper(StructType dataSchema) |
| Modifier and Type | Method and Description |
|---|---|
Tuple2<Column,java.util.Optional<Expression>> |
getMaxColumn(Column column)
Given a logical column in the data schema provided when creating
this, return the
corresponding MAX column and an optional column adjustment expression from the statistic schema
that stores the MAX values for the provided logical column. |
Tuple2<Column,java.util.Optional<Expression>> |
getMinColumn(Column column)
Given a logical column in the data schema provided when creating
this, return the
corresponding MIN column and an optional column adjustment expression from the statistic schema
that stores the MIN values for the provided logical column. |
Column |
getNullCountColumn(Column column)
Given a logical column in the data schema provided when creating
this, return the
corresponding NULL_COUNT column in the statistic schema that stores the null count values for
the provided logical column. |
Column |
getNumRecordsColumn()
Returns the NUM_RECORDS column in the statistic schema
|
static StructType |
getStatsSchema(StructType dataSchema)
Returns the expected statistics schema given a table schema.
|
static boolean |
isSkippingEligibleLiteral(Literal literal)
Returns true if the given literal is skipping-eligible.
|
boolean |
isSkippingEligibleMinMaxColumn(Column column)
Returns true if the given column is skipping-eligible using min/max statistics.
|
boolean |
isSkippingEligibleNullCountColumn(Column column)
Returns true if the given column is skipping-eligible using null count statistics.
|
public StatsSchemaHelper(StructType dataSchema)
public static boolean isSkippingEligibleLiteral(Literal literal)
public static StructType getStatsSchema(StructType dataSchema)
Here is an example of a data schema along with the schema of the statistics that would be collected.
Data Schema: {{{ |-- a: struct (nullable = true) | |-- b: struct (nullable = true) | | |-- c: long (nullable = true) }}}
Collected Statistics: {{{ |-- stats: struct (nullable = true) | |-- numRecords: long (nullable = false) | |-- minValues: struct (nullable = false) | | |-- a: struct (nullable = false) | | | |-- b: struct (nullable = false) | | | | |-- c: long (nullable = true) | |-- maxValues: struct (nullable = false) | | |-- a: struct (nullable = false) | | | |-- b: struct (nullable = false) | | | | |-- c: long (nullable = true) | |-- nullCount: struct (nullable = false) | | |-- a: struct (nullable = false) | | | |-- b: struct (nullable = false) | | | | |-- c: long (nullable = true) }}}
public Tuple2<Column,java.util.Optional<Expression>> getMinColumn(Column column)
this, return the
corresponding MIN column and an optional column adjustment expression from the statistic schema
that stores the MIN values for the provided logical column.column - the logical column name.public Tuple2<Column,java.util.Optional<Expression>> getMaxColumn(Column column)
this, return the
corresponding MAX column and an optional column adjustment expression from the statistic schema
that stores the MAX values for the provided logical column.column - the logical column name.public Column getNullCountColumn(Column column)
this, return the
corresponding NULL_COUNT column in the statistic schema that stores the null count values for
the provided logical column.public Column getNumRecordsColumn()
public boolean isSkippingEligibleMinMaxColumn(Column column)
public boolean isSkippingEligibleNullCountColumn(Column column)