Class CreateCheckpointIterator

Object
io.delta.kernel.internal.replay.CreateCheckpointIterator
All Implemented Interfaces:
CloseableIterator<FilteredColumnarBatch>, Closeable, AutoCloseable, Iterator<FilteredColumnarBatch>

public class CreateCheckpointIterator extends Object implements CloseableIterator<FilteredColumnarBatch>
Replays a history of actions from the transaction log to reconstruct the checkpoint state of the table. The rules for constructing the checkpoint state are defined in the Delta Protocol: Checkpoint Reconciliation Rules.

Currently, the following rules are implemented:

  • The latest protocol action seen wins
  • The latest metaData action seen wins
  • For txn actions, the latest version seen for a given appId wins
  • Logical files in a table are identified by their (path, deletionVector.uniqueId) primary key. File actions (add or remove) reference logical files, and a log can contain any number of references to a single file.
  • To replay the log, scan all file actions and keep only the newest reference for each logical file.
  • add actions in the result identify logical files currently present in the table (for queries). remove actions in the result identify tombstones of logical files no longer present in the table (for VACUUM).
  • commit info actions are not included

Following rules are not implemented. They will be implemented as we add support for more table features over time.

  • For domainMetadata, the latest domainMetadata seen for a given domain wins.