@ThreadSafe public class TableSchemaResolver extends Object
| Constructor and Description |
|---|
TableSchemaResolver(HoodieTableMetaClient metaClient) |
| Modifier and Type | Method and Description |
|---|---|
static org.apache.avro.Schema |
appendPartitionColumns(org.apache.avro.Schema dataSchema,
Option<String[]> partitionFields) |
static org.apache.parquet.schema.MessageType |
convertAvroSchemaToParquet(org.apache.avro.Schema schema,
org.apache.hadoop.conf.Configuration hadoopConf) |
org.apache.avro.Schema |
getLatestSchema(org.apache.avro.Schema writeSchema,
boolean convertTableSchemaToAddNamespace,
Functions.Function1<org.apache.avro.Schema,org.apache.avro.Schema> converterFn)
Deprecated.
will be removed (HUDI-4472)
|
org.apache.avro.Schema |
getTableAvroSchema()
Gets full schema (user + metadata) for a hoodie table in Avro format.
|
org.apache.avro.Schema |
getTableAvroSchema(boolean includeMetadataFields)
Gets schema for a hoodie table in Avro format, can choice if include metadata fields.
|
org.apache.avro.Schema |
getTableAvroSchema(HoodieInstant instant,
boolean includeMetadataFields)
Fetches tables schema in Avro format as of the given instant
|
org.apache.avro.Schema |
getTableAvroSchema(String timestamp)
Fetches tables schema in Avro format as of the given instant
|
org.apache.avro.Schema |
getTableAvroSchemaFromDataFile() |
Option<org.apache.avro.Schema> |
getTableAvroSchemaFromLatestCommit(boolean includeMetadataFields)
Returns table's latest Avro
Schema iff table is non-empty (ie there's at least
a single commit)
This method differs from getTableAvroSchema(boolean) in that it won't fallback
to use table's schema used at creation |
org.apache.avro.Schema |
getTableAvroSchemaWithoutMetadataFields()
Deprecated.
use
getTableAvroSchema(boolean) instead |
Option<String> |
getTableHistorySchemaStrFromCommitMetadata()
Gets the history schemas as String for a hoodie table from the HoodieCommitMetadata of the instant.
|
Option<InternalSchema> |
getTableInternalSchemaFromCommitMetadata()
Gets the InternalSchema for a hoodie table from the HoodieCommitMetadata of the instant.
|
Option<InternalSchema> |
getTableInternalSchemaFromCommitMetadata(String timestamp)
Gets the InternalSchema for a hoodie table from the HoodieCommitMetadata of the instant.
|
org.apache.parquet.schema.MessageType |
getTableParquetSchema()
Gets full schema (user + metadata) for a hoodie table in Parquet format.
|
boolean |
hasOperationField()
NOTE: This method could only be used in tests
|
static boolean |
isSchemaCompatible(org.apache.avro.Schema oldSchema,
org.apache.avro.Schema newSchema)
HUDI specific validation of schema evolution.
|
static boolean |
isSchemaCompatible(String oldSchema,
String newSchema) |
org.apache.parquet.schema.MessageType |
readSchemaFromLastCompaction(Option<HoodieInstant> lastCompactionCommitOpt)
Deprecated.
please use
getTableAvroSchema(HoodieInstant, boolean) instead |
static org.apache.parquet.schema.MessageType |
readSchemaFromLogFile(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path)
Read the schema from the log file on path.
|
public TableSchemaResolver(HoodieTableMetaClient metaClient)
public org.apache.avro.Schema getTableAvroSchemaFromDataFile()
public org.apache.avro.Schema getTableAvroSchema()
throws Exception
Exceptionpublic org.apache.avro.Schema getTableAvroSchema(boolean includeMetadataFields)
throws Exception
includeMetadataFields - choice if include metadata fieldsExceptionpublic org.apache.avro.Schema getTableAvroSchema(String timestamp) throws Exception
timestamp - as of which table's schema will be fetchedExceptionpublic org.apache.avro.Schema getTableAvroSchema(HoodieInstant instant, boolean includeMetadataFields) throws Exception
instant - as of which table's schema will be fetchedExceptionpublic org.apache.parquet.schema.MessageType getTableParquetSchema()
throws Exception
Exception@Deprecated public org.apache.avro.Schema getTableAvroSchemaWithoutMetadataFields() throws Exception
getTableAvroSchema(boolean) insteadExceptionpublic static org.apache.parquet.schema.MessageType convertAvroSchemaToParquet(org.apache.avro.Schema schema,
org.apache.hadoop.conf.Configuration hadoopConf)
public static boolean isSchemaCompatible(org.apache.avro.Schema oldSchema,
org.apache.avro.Schema newSchema)
oldSchema - Older schema to check.newSchema - Newer schema to check.public Option<org.apache.avro.Schema> getTableAvroSchemaFromLatestCommit(boolean includeMetadataFields) throws Exception
Schema iff table is non-empty (ie there's at least
a single commit)
This method differs from getTableAvroSchema(boolean) in that it won't fallback
to use table's schema used at creationException@Deprecated public org.apache.avro.Schema getLatestSchema(org.apache.avro.Schema writeSchema, boolean convertTableSchemaToAddNamespace, Functions.Function1<org.apache.avro.Schema,org.apache.avro.Schema> converterFn)
writeSchema - incoming batch's write schema.convertTableSchemaToAddNamespace - true if table schema needs to be converted. false otherwise.converterFn - converter function to be called over table schema (to add namespace may be). Each caller can decide if any conversion is required.public org.apache.parquet.schema.MessageType readSchemaFromLastCompaction(Option<HoodieInstant> lastCompactionCommitOpt) throws Exception
getTableAvroSchema(HoodieInstant, boolean) insteadExceptionpublic static org.apache.parquet.schema.MessageType readSchemaFromLogFile(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path)
throws IOException
IOExceptionpublic Option<InternalSchema> getTableInternalSchemaFromCommitMetadata()
public Option<InternalSchema> getTableInternalSchemaFromCommitMetadata(String timestamp)
public Option<String> getTableHistorySchemaStrFromCommitMetadata()
public boolean hasOperationField()
Copyright © 2022 The Apache Software Foundation. All rights reserved.