Class SchemaUtils

Object
io.delta.kernel.internal.util.SchemaUtils

public class SchemaUtils extends Object
Utility methods for schema related operations such as validating the schema has no duplicate columns and the names contain only valid characters.
  • Method Details

    • validateSchema

      public static void validateSchema(StructType schema, boolean isColumnMappingEnabled)
      Validate the schema. This method checks if the schema has no duplicate columns, the names contain only valid characters and the data types are supported.
      Parameters:
      schema - the schema to validate
      isColumnMappingEnabled - whether column mapping is enabled. When column mapping is enabled, the column names in the schema can contain special characters that are allowed as column names in the Parquet file
      Throws:
      IllegalArgumentException - if the schema is invalid
    • validatePartitionColumns

      public static void validatePartitionColumns(StructType schema, List<String> partitionCols)
      Verify the partition columns exists in the table schema and a supported data type for a partition column.
      Parameters:
      schema -
      partitionCols -
    • casePreservingPartitionColNames

      public static List<String> casePreservingPartitionColNames(StructType tableSchema, List<String> partitionColumns)
      Delta expects partition column names to be same case preserving as the name in the schema. E.g: Schema: (a INT, B STRING) and partition columns: (b). In this case we store the schema as (a INT, B STRING) and partition columns as (B). This method expects the inputs are already validated (i.e. schema contains all the partition columns).
    • casePreservingPartitionColNames

      public static Map<String,Literal> casePreservingPartitionColNames(List<String> partitionColNames, Map<String,Literal> partitionValues)
      Convert the partition column names in partitionValues map into the same case as the column in the table metadata. Delta expects the partition column names to preserve the case same as the table schema.
      Parameters:
      partitionColNames - List of partition columns in the table metadata. The names preserve the case as given by the connector when the table is created.
      partitionValues - Map of partition column name to partition value. Convert the partition column name to be same case preserving name as its equivalent column in the partitionColName. Column name comparison is case-insensitive.
      Returns:
      Rewritten partitionValues map with names case preserved.
    • findColIndex

      public static int findColIndex(StructType schema, String colName)
      Search (case-insensitive) for the given colName in the schema and return its position in the schema.
      Parameters:
      schema - StructType
      colName - Name of the column whose index is needed.
      Returns:
      Valid index or -1 if not found.