public class PartitionUtils
extends Object
| Modifier and Type | Method and Description |
|---|---|
static String |
getTargetDirectory(String dataRoot,
java.util.List<String> partitionColNames,
java.util.Map<String,Literal> partitionValues)
Get the target directory for writing data for given partition values.
|
static StructType |
physicalSchemaWithoutPartitionColumns(StructType logicalSchema,
StructType physicalSchema,
java.util.Set<String> columnsToRemove)
Utility method to remove the given columns (as
columnsToRemove) from the
given physicalSchema. |
static Predicate |
rewritePartitionPredicateOnCheckpointFileSchema(Predicate predicate,
java.util.Map<String,StructField> partitionColNameToField)
Rewrite the given predicate on partition columns on `partitionValues_parsed` in checkpoint
schema.
|
static Predicate |
rewritePartitionPredicateOnScanFileSchema(Predicate predicate,
java.util.Map<String,StructField> partitionColMetadata)
Utility method to rewrite the partition predicate referring to the table schema as predicate
referring to the
partitionValues in scan files read from Delta log. |
static MapValue |
serializePartitionMap(java.util.Map<String,Literal> partitionValueMap)
Convert the given partition values to a
MapValue that can be serialized to a Delta
commit file. |
static Tuple2<Predicate,Predicate> |
splitMetadataAndDataPredicates(Predicate predicate,
java.util.Set<String> partitionColNames)
Split the given predicate into predicate on partition columns and predicate on data columns.
|
static java.util.Map<String,Literal> |
validateAndSanitizePartitionValues(StructType tableSchema,
java.util.List<String> partitionColNames,
java.util.Map<String,Literal> partitionValues)
Validate
partitionValues contains values for every partition column in the table
and the type of the value is correct. |
static ColumnarBatch |
withPartitionColumns(ExpressionHandler expressionHandler,
ColumnarBatch dataBatch,
java.util.Map<String,String> partitionValues,
StructType schemaWithPartitionCols) |
public static StructType physicalSchemaWithoutPartitionColumns(StructType logicalSchema, StructType physicalSchema, java.util.Set<String> columnsToRemove)
columnsToRemove) from the
given physicalSchema.physicalSchema - logicalSchema - To create a logical name to physical name map. Partition column names
are in logical space and we need to identify the equivalent
physical column name.columnsToRemove - public static ColumnarBatch withPartitionColumns(ExpressionHandler expressionHandler, ColumnarBatch dataBatch, java.util.Map<String,String> partitionValues, StructType schemaWithPartitionCols)
public static MapValue serializePartitionMap(java.util.Map<String,Literal> partitionValueMap)
MapValue that can be serialized to a Delta
commit file.partitionValueMap - Expected the partition column names to be same case as in the
schema. We want to preserve the case of the partition column names
when serializing to the Delta commit file.MapValue representing the serialized partition values that can be written to
a Delta commit file.public static java.util.Map<String,Literal> validateAndSanitizePartitionValues(StructType tableSchema, java.util.List<String> partitionColNames, java.util.Map<String,Literal> partitionValues)
partitionValues contains values for every partition column in the table
and the type of the value is correct. Once validated the partition values are sanitized
to match the case of the partition column names in the table schema and returnedtableSchema - Schema of the table.partitionColNames - Partition column name. These should be from the table metadata
that retain the same case as in the table schema.partitionValues - Map of partition column to value map given by the connectorpublic static Tuple2<Predicate,Predicate> splitMetadataAndDataPredicates(Predicate predicate, java.util.Set<String> partitionColNames)
predicate - partitionColNames - public static Predicate rewritePartitionPredicateOnCheckpointFileSchema(Predicate predicate, java.util.Map<String,StructField> partitionColNameToField)
predicate - Predicate on partition columns.partitionColNameToField - Map of partition column name (in lower case) to its
StructField.Predicate on `partitionValues_parsed` in `add`.public static Predicate rewritePartitionPredicateOnScanFileSchema(Predicate predicate, java.util.Map<String,StructField> partitionColMetadata)
partitionValues in scan files read from Delta log. The scan file
batch is returned by the Scan.getScanFiles(Engine).
E.g. given predicate on partition columns:
p1 = 'new york' && p2 >= 26 where p1 is of type string and p2 is of int
Rewritten expression looks like:
element_at(Column('add', 'partitionValues'), 'p1') = 'new york'
&&
partition_value(element_at(Column('add', 'partitionValues'), 'p2'), 'integer') >= 26
The column `add.partitionValues` is a map(string -> string) type. Each partition
values is in string serialization format according to the Delta protocol. Expression
`partition_value` deserializes the string value into the given partition column type value.
String type partition values don't need any deserialization.
predicate - Predicate containing filters only on partition columns.partitionColMetadata - Map of partition column name (in lower case) to its type.public static String getTargetDirectory(String dataRoot,
java.util.List<String> partitionColNames,
java.util.Map<String,Literal> partitionValues)
dataRoot - Root directory where the data is stored.partitionColNames - Partition column names. We need this to create the target directory
structure that is consistent levels of directories.partitionValues - Partition values to create the target directory.