T - Sub type of HoodieRecordPayloadI - Type of inputsK - Type of keysO - Type of outputspublic abstract class HoodieTable<T,I,K,O> extends Object implements Serializable
| Modifier and Type | Field and Description |
|---|---|
protected HoodieWriteConfig |
config |
protected HoodieEngineContext |
context |
protected HoodieIndex<?,?> |
index |
protected HoodieTableMetaClient |
metaClient |
protected TaskContextSupplier |
taskContextSupplier |
| Modifier | Constructor and Description |
|---|---|
protected |
HoodieTable(HoodieWriteConfig config,
HoodieEngineContext context,
HoodieTableMetaClient metaClient) |
| Modifier and Type | Method and Description |
|---|---|
abstract HoodieBootstrapWriteMetadata<O> |
bootstrap(HoodieEngineContext context,
Option<Map<String,String>> extraMetadata)
Perform metadata/full bootstrap of a Hudi table.
|
abstract HoodieWriteMetadata<O> |
bulkInsert(HoodieEngineContext context,
String instantTime,
I records,
Option<BulkInsertPartitioner> bulkInsertPartitioner)
Bulk Insert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
bulkInsertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords,
Option<BulkInsertPartitioner> bulkInsertPartitioner)
Bulk Insert the given prepared records into the Hoodie table, at the supplied instantTime.
|
abstract HoodieCleanMetadata |
clean(HoodieEngineContext context,
String cleanInstantTime)
Executes a new clean action.
|
HoodieCleanMetadata |
clean(HoodieEngineContext context,
String cleanInstantTime,
boolean skipLocking)
Deprecated.
|
abstract HoodieWriteMetadata<O> |
cluster(HoodieEngineContext context,
String clusteringInstantTime)
Execute Clustering on the table.
|
abstract HoodieWriteMetadata<O> |
compact(HoodieEngineContext context,
String compactionInstantTime)
Run Compaction on the table.
|
abstract HoodieWriteMetadata<O> |
delete(HoodieEngineContext context,
String instantTime,
K keys)
|
void |
deleteMetadataIndexIfNecessary()
Deletes the metadata partition if the writer disables any metadata index.
|
abstract HoodieWriteMetadata<O> |
deletePartitions(HoodieEngineContext context,
String instantTime,
List<String> partitions)
Deletes all data of partitions.
|
abstract HoodieWriteMetadata<O> |
deletePrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Delete records from Hoodie table based on
HoodieKey and HoodieRecordLocation specified in
preppedRecords. |
void |
finalizeWrite(HoodieEngineContext context,
String instantTs,
List<HoodieWriteStat> stats)
Finalize the written data onto storage.
|
HoodieActiveTimeline |
getActiveTimeline() |
String |
getBaseFileExtension() |
HoodieFileFormat |
getBaseFileFormat() |
TableFileSystemView.BaseFileOnlyView |
getBaseFileOnlyView()
Get the base file only view of the file system for this table.
|
HoodieTimeline |
getCleanTimeline()
Get clean timeline.
|
HoodieTimeline |
getCompletedCleanTimeline()
Get only the completed (no-inflights) clean timeline.
|
HoodieTimeline |
getCompletedCommitsTimeline()
Get only the completed (no-inflights) commit + deltacommit timeline.
|
HoodieTimeline |
getCompletedCommitTimeline()
Get only the completed (no-inflights) commit timeline.
|
HoodieTimeline |
getCompletedSavepointTimeline()
Get only the completed (no-inflights) savepoint timeline.
|
HoodieWriteConfig |
getConfig() |
static ConsistencyGuard |
getConsistencyGuard(org.apache.hadoop.fs.FileSystem fs,
ConsistencyGuardConfig consistencyGuardConfig)
Instantiate
ConsistencyGuard based on configs. |
HoodieEngineContext |
getContext() |
TableFileSystemView |
getFileSystemView()
Get the view of the file system for this table.
|
org.apache.hadoop.conf.Configuration |
getHadoopConf() |
SyncableFileSystemView |
getHoodieView()
Get complete view of the file system for this table with ability to force sync.
|
HoodieIndex<?,?> |
getIndex()
Return the index.
|
protected abstract HoodieIndex<?,?> |
getIndex(HoodieWriteConfig config,
HoodieEngineContext context) |
Option<HoodieTableMetadataWriter> |
getIndexingMetadataWriter(String triggeringInstantTimestamp)
Gets the metadata writer for async indexer.
|
protected Set<String> |
getInvalidDataPaths(WriteMarkers markers)
Returns the possible invalid data file name with given marker files.
|
HoodieFileFormat |
getLogFileFormat() |
HoodieTableMetaClient |
getMetaClient() |
HoodieTableMetadata |
getMetadata() |
HoodieTableMetadata |
getMetadataTable() |
Option<HoodieTableMetadataWriter> |
getMetadataWriter(String triggeringInstantTimestamp)
Get Table metadata writer.
|
protected Option<HoodieTableMetadataWriter> |
getMetadataWriter(String triggeringInstantTimestamp,
HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy)
Get Table metadata writer.
|
Option<HoodieFileFormat> |
getPartitionMetafileFormat() |
HoodieTimeline |
getPendingCommitTimeline()
Get only the inflights (no-completed) commit timeline.
|
Runnable |
getPreExecuteRunnable() |
HoodieTimeline |
getRestoreTimeline()
Get restore timeline.
|
HoodieTimeline |
getRollbackTimeline()
Get rollback timeline.
|
Set<String> |
getSavepointTimestamps()
Get the list of savepoint timestamps in this table.
|
TableFileSystemView.SliceView |
getSliceView()
Get the full view of the file system for this table.
|
HoodieStorageLayout |
getStorageLayout() |
protected HoodieStorageLayout |
getStorageLayout(HoodieWriteConfig config) |
TaskContextSupplier |
getTaskContextSupplier() |
abstract Option<HoodieIndexCommitMetadata> |
index(HoodieEngineContext context,
String indexInstantTime)
Execute requested index action.
|
abstract HoodieWriteMetadata<O> |
insert(HoodieEngineContext context,
String instantTime,
I records)
Insert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
insertOverwrite(HoodieEngineContext context,
String instantTime,
I records)
Replaces all the existing records and inserts the specified new records into Hoodie table at the supplied instantTime,
for the partition paths contained in input records.
|
abstract HoodieWriteMetadata<O> |
insertOverwriteTable(HoodieEngineContext context,
String instantTime,
I records)
Delete all the existing records of the Hoodie table and inserts the specified new records into Hoodie table at the supplied instantTime,
for the partition paths contained in input records.
|
abstract HoodieWriteMetadata<O> |
insertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Inserts the given prepared records into the Hoodie table, at the supplied instantTime.
|
boolean |
isMetadataTable() |
boolean |
isPartitioned() |
HoodieWriteMetadata<O> |
logCompact(HoodieEngineContext context,
String logCompactionInstantTime)
Run Log Compaction on the table.
|
void |
maybeDeleteMetadataTable()
Deletes the metadata table if the writer disables metadata table with hoodie.metadata.enable=false
|
protected void |
reconcileAgainstMarkers(HoodieEngineContext context,
String instantTs,
List<HoodieWriteStat> stats,
boolean consistencyCheckEnabled)
Reconciles WriteStats and marker files to detect and safely delete duplicate data files created because of Spark
retries.
|
boolean |
requireSortedRecords() |
abstract HoodieRestoreMetadata |
restore(HoodieEngineContext context,
String restoreInstantTimestamp,
String savepointToRestoreTimestamp)
Restore the table to the given instant.
|
abstract HoodieRollbackMetadata |
rollback(HoodieEngineContext context,
String rollbackInstantTime,
HoodieInstant commitInstant,
boolean deleteInstants,
boolean skipLocking)
Rollback the (inflight/committed) record changes with the given commit time.
|
abstract void |
rollbackBootstrap(HoodieEngineContext context,
String instantTime)
Perform rollback of bootstrap of a Hudi table.
|
void |
rollbackInflightClustering(HoodieInstant inflightInstant,
Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
Rollback inflight clustering instant to requested clustering instant
|
void |
rollbackInflightCompaction(HoodieInstant inflightInstant) |
void |
rollbackInflightCompaction(HoodieInstant inflightInstant,
Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
Rollback failed compactions.
|
void |
rollbackInflightLogCompaction(HoodieInstant inflightInstant) |
void |
rollbackInflightLogCompaction(HoodieInstant inflightInstant,
Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
Rollback failed compactions.
|
abstract HoodieSavepointMetadata |
savepoint(HoodieEngineContext context,
String instantToSavepoint,
String user,
String comment)
Create a savepoint at the specified instant, so that the table can be restored
to this point-in-timeline later if needed.
|
abstract Option<HoodieCleanerPlan> |
scheduleCleaning(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule cleaning for the instant time.
|
abstract Option<HoodieClusteringPlan> |
scheduleClustering(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule clustering for the instant time.
|
abstract Option<HoodieCompactionPlan> |
scheduleCompaction(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule compaction for the instant time.
|
abstract Option<HoodieIndexPlan> |
scheduleIndexing(HoodieEngineContext context,
String indexInstantTime,
List<MetadataPartitionType> partitionsToIndex)
Schedules Indexing for the table to the given instant.
|
Option<HoodieCompactionPlan> |
scheduleLogCompaction(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule log compaction for the instant time.
|
abstract Option<HoodieRestorePlan> |
scheduleRestore(HoodieEngineContext context,
String restoreInstantTimestamp,
String savepointToRestoreTimestamp)
Schedules Restore for the table to the given instant.
|
abstract Option<HoodieRollbackPlan> |
scheduleRollback(HoodieEngineContext context,
String instantTime,
HoodieInstant instantToRollback,
boolean skipTimelinePublish,
boolean shouldRollbackUsingMarkers,
boolean isRestore)
Schedule rollback for the instant time.
|
boolean |
shouldTrackSuccessRecords()
When
HoodieTableConfig.POPULATE_META_FIELDS is enabled,
we need to track written records within WriteStatus in two cases:
When the HoodieIndex being used is not implicit with storage
If any of the metadata table partitions (record index, etc) which require written record tracking are enabled
|
abstract HoodieWriteMetadata<O> |
upsert(HoodieEngineContext context,
String instantTime,
I records)
Upsert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
upsertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Upserts the given prepared records into the Hoodie table, at the supplied instantTime.
|
void |
validateInsertSchema() |
void |
validateUpsertSchema() |
protected final HoodieWriteConfig config
protected final HoodieTableMetaClient metaClient
protected final HoodieIndex<?,?> index
protected final TaskContextSupplier taskContextSupplier
protected final transient HoodieEngineContext context
protected HoodieTable(HoodieWriteConfig config, HoodieEngineContext context, HoodieTableMetaClient metaClient)
public boolean isMetadataTable()
protected abstract HoodieIndex<?,?> getIndex(HoodieWriteConfig config, HoodieEngineContext context)
protected HoodieStorageLayout getStorageLayout(HoodieWriteConfig config)
public HoodieTableMetadata getMetadata()
public abstract HoodieWriteMetadata<O> upsert(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> insert(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> bulkInsert(HoodieEngineContext context, String instantTime, I records, Option<BulkInsertPartitioner> bulkInsertPartitioner)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertbulkInsertPartitioner - User Defined Partitionerpublic abstract HoodieWriteMetadata<O> delete(HoodieEngineContext context, String instantTime, K keys)
public abstract HoodieWriteMetadata<O> deletePrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
HoodieKey and HoodieRecordLocation specified in
preppedRecords.context - HoodieEngineContext.instantTime - Instant Time for the action.preppedRecords - Empty records with key and locator set.HoodieWriteMetadatapublic abstract HoodieWriteMetadata<O> deletePartitions(HoodieEngineContext context, String instantTime, List<String> partitions)
context - HoodieEngineContextinstantTime - Instant Time for the actionpartitions - List of partition to be deletedpublic abstract HoodieWriteMetadata<O> upsertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> insertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> bulkInsertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords, Option<BulkInsertPartitioner> bulkInsertPartitioner)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertbulkInsertPartitioner - User Defined Partitionerpublic abstract HoodieWriteMetadata<O> insertOverwrite(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant time for the replace actionrecords - input recordspublic abstract HoodieWriteMetadata<O> insertOverwriteTable(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant time for the replace actionrecords - input recordspublic HoodieWriteConfig getConfig()
public HoodieTableMetaClient getMetaClient()
public boolean isPartitioned()
public org.apache.hadoop.conf.Configuration getHadoopConf()
public TableFileSystemView getFileSystemView()
public TableFileSystemView.BaseFileOnlyView getBaseFileOnlyView()
public TableFileSystemView.SliceView getSliceView()
public SyncableFileSystemView getHoodieView()
public HoodieTimeline getCompletedCommitsTimeline()
public HoodieTimeline getCompletedCommitTimeline()
public HoodieTimeline getPendingCommitTimeline()
public HoodieTimeline getCompletedCleanTimeline()
public HoodieTimeline getCleanTimeline()
public HoodieTimeline getRollbackTimeline()
public HoodieTimeline getRestoreTimeline()
public HoodieTimeline getCompletedSavepointTimeline()
public Set<String> getSavepointTimestamps()
public HoodieActiveTimeline getActiveTimeline()
public HoodieIndex<?,?> getIndex()
public HoodieStorageLayout getStorageLayout()
public abstract Option<HoodieCompactionPlan> scheduleCompaction(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling compactionextraMetadata - additional metadata to write into planpublic abstract HoodieWriteMetadata<O> compact(HoodieEngineContext context, String compactionInstantTime)
context - HoodieEngineContextcompactionInstantTime - Instant Timepublic Option<HoodieCompactionPlan> scheduleLogCompaction(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling log compactionextraMetadata - additional metadata to write into planpublic HoodieWriteMetadata<O> logCompact(HoodieEngineContext context, String logCompactionInstantTime)
context - HoodieEngineContextlogCompactionInstantTime - Instant Timepublic abstract Option<HoodieClusteringPlan> scheduleClustering(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling clusteringextraMetadata - additional metadata to write into planpublic abstract HoodieWriteMetadata<O> cluster(HoodieEngineContext context, String clusteringInstantTime)
context - HoodieEngineContextclusteringInstantTime - Instant Timepublic abstract HoodieBootstrapWriteMetadata<O> bootstrap(HoodieEngineContext context, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextextraMetadata - Additional Metadata for storing in commit file.public abstract void rollbackBootstrap(HoodieEngineContext context, String instantTime)
context - HoodieEngineContextpublic abstract Option<HoodieCleanerPlan> scheduleCleaning(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling cleaningextraMetadata - additional metadata to write into plan@Deprecated public HoodieCleanMetadata clean(HoodieEngineContext context, String cleanInstantTime, boolean skipLocking)
public abstract HoodieCleanMetadata clean(HoodieEngineContext context, String cleanInstantTime)
public abstract Option<HoodieRollbackPlan> scheduleRollback(HoodieEngineContext context, String instantTime, HoodieInstant instantToRollback, boolean skipTimelinePublish, boolean shouldRollbackUsingMarkers, boolean isRestore)
context - HoodieEngineContextinstantTime - Instant Time for scheduling rollbackinstantToRollback - instant to be rolled backshouldRollbackUsingMarkers - uses marker based rollback strategy when set to true. uses list based rollback when false.public abstract HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollbackInstantTime, HoodieInstant commitInstant, boolean deleteInstants, boolean skipLocking)
Three steps: (1) Atomically unpublish this commit (2) clean indexing data (3) clean new generated parquet files. (4) Finally delete .commit or .inflight file, if deleteInstants = true
public abstract Option<HoodieIndexPlan> scheduleIndexing(HoodieEngineContext context, String indexInstantTime, List<MetadataPartitionType> partitionsToIndex)
context - HoodieEngineContextindexInstantTime - Instant time for scheduling index action.partitionsToIndex - List of MetadataPartitionType that should be indexed.public abstract Option<HoodieIndexCommitMetadata> index(HoodieEngineContext context, String indexInstantTime)
context - HoodieEngineContextindexInstantTime - Instant time for which index action was scheduled.public abstract HoodieSavepointMetadata savepoint(HoodieEngineContext context, String instantToSavepoint, String user, String comment)
public abstract HoodieRestoreMetadata restore(HoodieEngineContext context, String restoreInstantTimestamp, String savepointToRestoreTimestamp)
public abstract Option<HoodieRestorePlan> scheduleRestore(HoodieEngineContext context, String restoreInstantTimestamp, String savepointToRestoreTimestamp)
public void rollbackInflightCompaction(HoodieInstant inflightInstant)
public void rollbackInflightLogCompaction(HoodieInstant inflightInstant)
public void rollbackInflightCompaction(HoodieInstant inflightInstant, Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
inflightInstant - Inflight Compaction Instantpublic void rollbackInflightClustering(HoodieInstant inflightInstant, Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
inflightInstant - Inflight clustering instantgetPendingRollbackInstantFunc - Function to get rollback instantpublic void rollbackInflightLogCompaction(HoodieInstant inflightInstant, Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
inflightInstant - Inflight Compaction Instantpublic void finalizeWrite(HoodieEngineContext context, String instantTs, List<HoodieWriteStat> stats) throws HoodieIOException
context - HoodieEngineContextstats - List of HoodieWriteStatsHoodieIOException - if some paths can't be finalized on storageprotected Set<String> getInvalidDataPaths(WriteMarkers markers) throws IOException
IOExceptionprotected void reconcileAgainstMarkers(HoodieEngineContext context, String instantTs, List<HoodieWriteStat> stats, boolean consistencyCheckEnabled) throws HoodieIOException
context - HoodieEngineContextinstantTs - Instant Timestampstats - Hoodie Write StatconsistencyCheckEnabled - Consistency Check EnabledHoodieIOExceptionpublic static ConsistencyGuard getConsistencyGuard(org.apache.hadoop.fs.FileSystem fs, ConsistencyGuardConfig consistencyGuardConfig) throws IOException
ConsistencyGuard based on configs.
Default consistencyGuard class is OptimisticConsistencyGuard.
IOExceptionpublic TaskContextSupplier getTaskContextSupplier()
public void validateUpsertSchema()
throws HoodieUpsertException
HoodieUpsertExceptionpublic void validateInsertSchema()
throws HoodieInsertException
HoodieInsertExceptionpublic HoodieFileFormat getBaseFileFormat()
public HoodieFileFormat getLogFileFormat()
public Option<HoodieFileFormat> getPartitionMetafileFormat()
public String getBaseFileExtension()
public boolean requireSortedRecords()
public HoodieEngineContext getContext()
public final Option<HoodieTableMetadataWriter> getMetadataWriter(String triggeringInstantTimestamp)
triggeringInstantTimestamp - - The instant that is triggering this metadata writeHoodieTableMetadataWriterpublic Option<HoodieTableMetadataWriter> getIndexingMetadataWriter(String triggeringInstantTimestamp)
triggeringInstantTimestamp - The instant that is triggering this metadata write.HoodieTableMetadataWriter.protected Option<HoodieTableMetadataWriter> getMetadataWriter(String triggeringInstantTimestamp, HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy)
Note: Get the metadata writer for the conf. If the metadata table doesn't exist, this wil trigger the creation of the table and the initial bootstrapping. Since this call is under the transaction lock, other concurrent writers are blocked from doing the similar initial metadata table creation and the bootstrapping.
triggeringInstantTimestamp - The instant that is triggering this metadata writefailedWritesCleaningPolicy - Cleaning policy on failed writesHoodieTableMetadataWriterpublic void maybeDeleteMetadataTable()
public void deleteMetadataIndexIfNecessary()
public HoodieTableMetadata getMetadataTable()
public boolean shouldTrackSuccessRecords()
HoodieTableConfig.POPULATE_META_FIELDS is enabled,
we need to track written records within WriteStatus in two cases:
public Runnable getPreExecuteRunnable()
Copyright © 2023 The Apache Software Foundation. All rights reserved.