public class PartitionUtils
extends Object
| Modifier and Type | Method and Description |
|---|---|
static StructType |
physicalSchemaWithoutPartitionColumns(StructType logicalSchema,
StructType physicalSchema,
java.util.Set<String> columnsToRemove)
Utility method to remove the given columns (as
columnsToRemove) from the
given physicalSchema. |
static Predicate |
rewritePartitionPredicateOnScanFileSchema(Predicate predicate,
java.util.Map<String,StructField> partitionColMetadata)
Utility method to rewrite the partition predicate referring to the table schema as predicate
referring to the
partitionValues in scan files read from Delta log. |
static Tuple2<Predicate,Predicate> |
splitMetadataAndDataPredicates(Predicate predicate,
java.util.Set<String> partitionColNames)
Split the given predicate into predicate on partition columns and predicate on data columns.
|
static ColumnarBatch |
withPartitionColumns(ExpressionHandler expressionHandler,
ColumnarBatch dataBatch,
java.util.Map<String,String> partitionValues,
StructType schemaWithPartitionCols) |
public static StructType physicalSchemaWithoutPartitionColumns(StructType logicalSchema, StructType physicalSchema, java.util.Set<String> columnsToRemove)
columnsToRemove) from the
given physicalSchema.physicalSchema - logicalSchema - To create a logical name to physical name map. Partition column names
are in logical space and we need to identify the equivalent
physical column name.columnsToRemove - public static ColumnarBatch withPartitionColumns(ExpressionHandler expressionHandler, ColumnarBatch dataBatch, java.util.Map<String,String> partitionValues, StructType schemaWithPartitionCols)
public static Tuple2<Predicate,Predicate> splitMetadataAndDataPredicates(Predicate predicate, java.util.Set<String> partitionColNames)
predicate - partitionColNames - public static Predicate rewritePartitionPredicateOnScanFileSchema(Predicate predicate, java.util.Map<String,StructField> partitionColMetadata)
partitionValues in scan files read from Delta log. The scan file
batch is returned by the Scan.getScanFiles(TableClient).
E.g. given predicate on partition columns:
p1 = 'new york' && p2 >= 26 where p1 is of type string and p2 is of int
Rewritten expression looks like:
element_at(Column('add', 'partitionValues'), 'p1') = 'new york'
&&
partition_value(element_at(Column('add', 'partitionValues'), 'p2'), 'integer') >= 26
The column `add.partitionValues` is a map(string -> string) type. Each partition
values is in string serialization format according to the Delta protocol. Expression
`partition_value` deserializes the string value into the given partition column type value.
String type partition values don't need any deserialization.
predicate - Predicate containing filters only on partition columns.partitionColMetadata - Map of partition column name (in lower case) to its type.