I - Type of input for the write clientpublic abstract class HoodieBackedTableMetadataWriter<I> extends Object implements HoodieTableMetadataWriter
| Modifier and Type | Field and Description |
|---|---|
protected HoodieTableMetaClient |
dataMetaClient |
protected HoodieWriteConfig |
dataWriteConfig |
protected List<MetadataPartitionType> |
enabledPartitionTypes |
protected HoodieEngineContext |
engineContext |
protected SerializableConfiguration |
hadoopConf |
protected HoodieBackedTableMetadata |
metadata |
static String |
METADATA_COMPACTION_TIME_SUFFIX |
protected HoodieTableMetaClient |
metadataMetaClient |
protected HoodieWriteConfig |
metadataWriteConfig |
protected Option<HoodieMetadataMetrics> |
metrics |
| Modifier | Constructor and Description |
|---|---|
protected |
HoodieBackedTableMetadataWriter(org.apache.hadoop.conf.Configuration hadoopConf,
HoodieWriteConfig writeConfig,
HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy,
HoodieEngineContext engineContext,
Option<String> inflightInstantTimestamp)
Hudi backed table metadata writer.
|
| Modifier and Type | Method and Description |
|---|---|
void |
buildMetadataPartitions(HoodieEngineContext engineContext,
List<HoodieIndexPartitionInfo> indexPartitionInfos)
Builds the given metadata partitions to create index.
|
protected abstract void |
bulkCommit(String instantTime,
MetadataPartitionType partitionType,
HoodieData<HoodieRecord> records,
int fileGroupCount)
Commit the
HoodieRecords to Metadata Table as a new delta-commit using bulk commit (if supported). |
protected static void |
checkNumDeltaCommits(HoodieTableMetaClient metaClient,
int maxNumDeltaCommitsWhenPending) |
protected void |
cleanIfNecessary(BaseHoodieWriteClient writeClient,
String instantTime) |
void |
close() |
protected void |
closeInternal() |
protected abstract void |
commit(String instantTime,
Map<MetadataPartitionType,HoodieData<HoodieRecord>> partitionRecordsMap)
Commit the
HoodieRecords to Metadata Table as a new delta-commit. |
protected void |
commitInternal(String instantTime,
Map<MetadataPartitionType,HoodieData<HoodieRecord>> partitionRecordsMap,
boolean isInitializing,
Option<BulkInsertPartitioner> bulkInsertPartitioner) |
protected void |
compactIfNecessary(BaseHoodieWriteClient writeClient,
String latestDeltacommitTime)
Perform a compaction on the Metadata Table.
|
protected abstract I |
convertHoodieDataToEngineSpecificData(HoodieData<HoodieRecord> records)
Converts the input records to the input format expected by the write client.
|
void |
dropMetadataPartitions(List<MetadataPartitionType> metadataPartitions)
Drop the given metadata partitions.
|
List<MetadataPartitionType> |
getEnabledPartitionTypes() |
HoodieBackedTableMetadata |
getTableMetadata() |
protected BaseHoodieWriteClient<?,I,?,?> |
getWriteClient() |
HoodieWriteConfig |
getWriteConfig() |
protected boolean |
initializeIfNeeded(HoodieTableMetaClient dataMetaClient,
Option<String> inflightInstantTimestamp)
Initialize the metadata table if needed.
|
protected abstract BaseHoodieWriteClient<?,I,?,?> |
initializeWriteClient() |
protected abstract void |
initRegistry() |
boolean |
isInitialized()
Returns true if the metadata table is initialized.
|
void |
performTableServices(Option<String> inFlightInstantTimestamp)
Optimize the metadata table by running compaction, clean and archive as required.
|
protected HoodieData<HoodieRecord> |
prepRecords(Map<MetadataPartitionType,HoodieData<HoodieRecord>> partitionRecordsMap)
Tag each record with the location in the given partition.
|
protected void |
preWrite(String instantTime)
Allows the implementation to perform any pre-commit operations like transitioning a commit to inflight if required.
|
void |
update(HoodieCleanMetadata cleanMetadata,
String instantTime)
Update from
HoodieCleanMetadata. |
void |
update(HoodieCommitMetadata commitMetadata,
HoodieData<WriteStatus> writeStatus,
String instantTime)
Update from
HoodieCommitMetadata. |
void |
update(HoodieRestoreMetadata restoreMetadata,
String instantTime)
Update from
HoodieRestoreMetadata. |
void |
update(HoodieRollbackMetadata rollbackMetadata,
String instantTime)
Update from
HoodieRollbackMetadata. |
protected void |
validateRollback(String commitToRollbackInstantTime,
HoodieInstant compactionInstant,
HoodieTimeline deltacommitsSinceCompaction) |
protected boolean |
validateTimelineBeforeSchedulingCompaction(Option<String> inFlightInstantTimestamp,
String latestDeltaCommitTimeInMetadataTable)
Validates the timeline for both main and metadata tables to ensure compaction on MDT can be scheduled.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdeletePartitionspublic static final String METADATA_COMPACTION_TIME_SUFFIX
protected HoodieWriteConfig metadataWriteConfig
protected HoodieWriteConfig dataWriteConfig
protected HoodieBackedTableMetadata metadata
protected HoodieTableMetaClient metadataMetaClient
protected HoodieTableMetaClient dataMetaClient
protected Option<HoodieMetadataMetrics> metrics
protected SerializableConfiguration hadoopConf
protected final transient HoodieEngineContext engineContext
protected final List<MetadataPartitionType> enabledPartitionTypes
protected HoodieBackedTableMetadataWriter(org.apache.hadoop.conf.Configuration hadoopConf,
HoodieWriteConfig writeConfig,
HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy,
HoodieEngineContext engineContext,
Option<String> inflightInstantTimestamp)
hadoopConf - Hadoop configuration to use for the metadata writerwriteConfig - Writer configfailedWritesCleaningPolicy - Cleaning policy on failed writesengineContext - Engine contextinflightInstantTimestamp - Timestamp of any instant in progressprotected abstract void initRegistry()
public HoodieWriteConfig getWriteConfig()
public HoodieBackedTableMetadata getTableMetadata()
public List<MetadataPartitionType> getEnabledPartitionTypes()
protected boolean initializeIfNeeded(HoodieTableMetaClient dataMetaClient, Option<String> inflightInstantTimestamp) throws IOException
dataMetaClient - - meta client for the data tableinflightInstantTimestamp - - timestamp of an instant in progress on the datasetIOException - on errorspublic void dropMetadataPartitions(List<MetadataPartitionType> metadataPartitions) throws IOException
HoodieTableMetadataWriterdropMetadataPartitions in interface HoodieTableMetadataWritermetadataPartitions - List of MDT partitions to dropIOException - on failuresprotected static void checkNumDeltaCommits(HoodieTableMetaClient metaClient, int maxNumDeltaCommitsWhenPending)
public void buildMetadataPartitions(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) throws IOException
HoodieTableMetadataWriterbuildMetadataPartitions in interface HoodieTableMetadataWriterindexPartitionInfos - - information about partitions to build such as partition type and base instant timeIOExceptionpublic void update(HoodieCommitMetadata commitMetadata, HoodieData<WriteStatus> writeStatus, String instantTime)
HoodieCommitMetadata.update in interface HoodieTableMetadataWritercommitMetadata - HoodieCommitMetadatainstantTime - Timestamp at which the commit was performedpublic void update(HoodieCleanMetadata cleanMetadata, String instantTime)
HoodieCleanMetadata.update in interface HoodieTableMetadataWritercleanMetadata - HoodieCleanMetadatainstantTime - Timestamp at which the clean was completedpublic void update(HoodieRestoreMetadata restoreMetadata, String instantTime)
HoodieRestoreMetadata.update in interface HoodieTableMetadataWriterrestoreMetadata - HoodieRestoreMetadatainstantTime - Timestamp at which the restore was performedpublic void update(HoodieRollbackMetadata rollbackMetadata, String instantTime)
HoodieRollbackMetadata.update in interface HoodieTableMetadataWriterrollbackMetadata - HoodieRollbackMetadatainstantTime - Timestamp at which the rollback was performedprotected void validateRollback(String commitToRollbackInstantTime, HoodieInstant compactionInstant, HoodieTimeline deltacommitsSinceCompaction)
public void close()
throws Exception
close in interface AutoCloseableExceptionprotected abstract void commit(String instantTime, Map<MetadataPartitionType,HoodieData<HoodieRecord>> partitionRecordsMap)
HoodieRecords to Metadata Table as a new delta-commit.instantTime - - Action instant time for this commitpartitionRecordsMap - - Map of partition type to its records to commitprotected abstract I convertHoodieDataToEngineSpecificData(HoodieData<HoodieRecord> records)
records - records to be convertedprotected void commitInternal(String instantTime, Map<MetadataPartitionType,HoodieData<HoodieRecord>> partitionRecordsMap, boolean isInitializing, Option<BulkInsertPartitioner> bulkInsertPartitioner)
protected void preWrite(String instantTime)
instantTime - time of commitprotected abstract void bulkCommit(String instantTime, MetadataPartitionType partitionType, HoodieData<HoodieRecord> records, int fileGroupCount)
HoodieRecords to Metadata Table as a new delta-commit using bulk commit (if supported).
This is used to optimize the initial commit to the MDT partition which may contains a large number of records and hence is more suited to bulkInsert for write performance.
instantTime - - Action instant time for this commitpartitionType - - The MDT partition to which records are to be committedrecords - - records to be bulk insertedfileGroupCount - - The maximum number of file groups to which the records will be written.protected HoodieData<HoodieRecord> prepRecords(Map<MetadataPartitionType,HoodieData<HoodieRecord>> partitionRecordsMap)
public void performTableServices(Option<String> inFlightInstantTimestamp)
Don't perform optimization if there are inflight operations on the dataset. This is for two reasons: - The compaction will contain the correct data as all failed operations have been rolled back. - Clean/compaction etc. will have the highest timestamp on the MDT and we won't be adding new operations with smaller timestamps to metadata table (makes for easier debugging)
This adds the limitations that long-running async operations (clustering, etc.) may cause delay in such MDT optimizations. We will relax this after MDT code has been hardened.
performTableServices in interface HoodieTableMetadataWriterinFlightInstantTimestamp - Timestamp of an instant which is in-progress. This instant is ignored while
deciding if optimizations can be performed.protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String latestDeltacommitTime)
Cases to be handled: 1. We cannot perform compaction if there are previous inflight operations on the dataset. This is because a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx.
2. In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a deltacommit.
protected void cleanIfNecessary(BaseHoodieWriteClient writeClient, String instantTime)
protected boolean validateTimelineBeforeSchedulingCompaction(Option<String> inFlightInstantTimestamp, String latestDeltaCommitTimeInMetadataTable)
protected void closeInternal()
public boolean isInitialized()
HoodieTableMetadataWriterisInitialized in interface HoodieTableMetadataWriterprotected BaseHoodieWriteClient<?,I,?,?> getWriteClient()
protected abstract BaseHoodieWriteClient<?,I,?,?> initializeWriteClient()
Copyright © 2023 The Apache Software Foundation. All rights reserved.