public class PartitionUtils
extends Object
| Modifier and Type | Method and Description |
|---|---|
static String |
getTargetDirectory(String dataRoot,
java.util.List<String> partitionColNames,
java.util.Map<String,Literal> partitionValues)
Get the target directory for writing data for given partition values.
|
static StructType |
physicalSchemaWithoutPartitionColumns(StructType logicalSchema,
StructType physicalSchema,
java.util.Set<String> columnsToRemove)
Utility method to remove the given columns (as
columnsToRemove) from the given physicalSchema. |
static Predicate |
rewritePartitionPredicateOnCheckpointFileSchema(Predicate predicate,
java.util.Map<String,StructField> partitionColNameToField)
Rewrite the given predicate on partition columns on `partitionValues_parsed` in checkpoint
schema.
|
static Predicate |
rewritePartitionPredicateOnScanFileSchema(Predicate predicate,
java.util.Map<String,StructField> partitionColMetadata)
Utility method to rewrite the partition predicate referring to the table schema as predicate
referring to the
partitionValues in scan files read from Delta log. |
static MapValue |
serializePartitionMap(java.util.Map<String,Literal> partitionValueMap)
Convert the given partition values to a
MapValue that can be serialized to a Delta
commit file. |
static Tuple2<Predicate,Predicate> |
splitMetadataAndDataPredicates(Predicate predicate,
java.util.Set<String> partitionColNames)
Split the given predicate into predicate on partition columns and predicate on data columns.
|
static java.util.Map<String,Literal> |
validateAndSanitizePartitionValues(StructType tableSchema,
java.util.List<String> partitionColNames,
java.util.Map<String,Literal> partitionValues)
Validate
partitionValues contains values for every partition column in the table and
the type of the value is correct. |
static void |
validatePredicateOnlyOnPartitionColumns(Predicate predicate,
java.util.Set<String> partitionColNames)
Validate that the given predicate references only (and at least one) partition columns.
|
static ColumnarBatch |
withPartitionColumns(ExpressionHandler expressionHandler,
ColumnarBatch dataBatch,
java.util.Map<String,String> partitionValues,
StructType schemaWithPartitionCols) |
public static StructType physicalSchemaWithoutPartitionColumns(StructType logicalSchema, StructType physicalSchema, java.util.Set<String> columnsToRemove)
columnsToRemove) from the given physicalSchema.physicalSchema - logicalSchema - To create a logical name to physical name map. Partition column names are
in logical space and we need to identify the equivalent physical column name.columnsToRemove - public static ColumnarBatch withPartitionColumns(ExpressionHandler expressionHandler, ColumnarBatch dataBatch, java.util.Map<String,String> partitionValues, StructType schemaWithPartitionCols)
public static MapValue serializePartitionMap(java.util.Map<String,Literal> partitionValueMap)
MapValue that can be serialized to a Delta
commit file.partitionValueMap - Expected the partition column names to be same case as in the schema.
We want to preserve the case of the partition column names when serializing to the Delta
commit file.MapValue representing the serialized partition values that can be written to a
Delta commit file.public static java.util.Map<String,Literal> validateAndSanitizePartitionValues(StructType tableSchema, java.util.List<String> partitionColNames, java.util.Map<String,Literal> partitionValues)
partitionValues contains values for every partition column in the table and
the type of the value is correct. Once validated the partition values are sanitized to match
the case of the partition column names in the table schema and returnedtableSchema - Schema of the table.partitionColNames - Partition column name. These should be from the table metadata that
retain the same case as in the table schema.partitionValues - Map of partition column to value map given by the connectorpublic static void validatePredicateOnlyOnPartitionColumns(Predicate predicate, java.util.Set<String> partitionColNames)
IllegalArgumentException - if the predicate does not reference any partition columns or
if it references any data columnspublic static Tuple2<Predicate,Predicate> splitMetadataAndDataPredicates(Predicate predicate, java.util.Set<String> partitionColNames)
predicate - partitionColNames - public static Predicate rewritePartitionPredicateOnCheckpointFileSchema(Predicate predicate, java.util.Map<String,StructField> partitionColNameToField)
predicate - Predicate on partition columns.partitionColNameToField - Map of partition column name (in lower case) to its StructField.Predicate on `partitionValues_parsed` in `add`.public static Predicate rewritePartitionPredicateOnScanFileSchema(Predicate predicate, java.util.Map<String,StructField> partitionColMetadata)
partitionValues in scan files read from Delta log. The scan file batch
is returned by the Scan.getScanFiles(Engine).
E.g. given predicate on partition columns: p1 = 'new york' && p2 >= 26 where p1 is
of type string and p2 is of int Rewritten expression looks like: element_at(Column('add', 'partitionValues'), 'p1') = 'new york' &&
partition_value(element_at(Column('add', 'partitionValues'), 'p2'), 'integer') >= 26
The column `add.partitionValues` is a map(string -> string) type. Each partition values is in string serialization format according to the Delta protocol. Expression `partition_value` deserializes the string value into the given partition column type value. String type partition values don't need any deserialization.
predicate - Predicate containing filters only on partition columns.partitionColMetadata - Map of partition column name (in lower case) to its type.public static String getTargetDirectory(String dataRoot,
java.util.List<String> partitionColNames,
java.util.Map<String,Literal> partitionValues)
dataRoot - Root directory where the data is stored.partitionColNames - Partition column names. We need this to create the target directory
structure that is consistent levels of directories.partitionValues - Partition values to create the target directory.