Package io.delta.kernel.internal.replay
Class CreateCheckpointIterator
Object
io.delta.kernel.internal.replay.CreateCheckpointIterator
- All Implemented Interfaces:
CloseableIterator<FilteredColumnarBatch>,Closeable,AutoCloseable,Iterator<FilteredColumnarBatch>
public class CreateCheckpointIterator
extends Object
implements CloseableIterator<FilteredColumnarBatch>
Replays a history of actions from the transaction log to reconstruct the checkpoint state of the
table. The rules for constructing the checkpoint state are defined in the Delta Protocol:
Checkpoint Reconciliation Rules.
Currently, the following rules are implemented:
- The latest protocol action seen wins
- The latest metaData action seen wins
- For txn actions, the latest version seen for a given appId wins
- Logical files in a table are identified by their (path, deletionVector.uniqueId) primary key. File actions (add or remove) reference logical files, and a log can contain any number of references to a single file.
- To replay the log, scan all file actions and keep only the newest reference for each logical file.
- add actions in the result identify logical files currently present in the table (for queries). remove actions in the result identify tombstones of logical files no longer present in the table (for VACUUM).
- commit info actions are not included
Following rules are not implemented. They will be implemented as we add support for more table features over time.
- For domainMetadata, the latest domainMetadata seen for a given domain wins.
-
Constructor Summary
ConstructorsConstructorDescriptionCreateCheckpointIterator(Engine engine, LogSegment logSegment, long minFileRetentionTimestampMillis) -
Method Summary
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface io.delta.kernel.utils.CloseableIterator
combine, filter, mapMethods inherited from interface java.util.Iterator
forEachRemaining, remove
-
Constructor Details
-
CreateCheckpointIterator
public CreateCheckpointIterator(Engine engine, LogSegment logSegment, long minFileRetentionTimestampMillis)
-
-
Method Details
-
hasNext
public boolean hasNext()Description copied from interface:CloseableIteratorReturns true if the iteration has more elements. (In other words, returns true if next would return an element rather than throwing an exception.)- Specified by:
hasNextin interfaceCloseableIterator<FilteredColumnarBatch>- Specified by:
hasNextin interfaceIterator<FilteredColumnarBatch>- Returns:
- true if the iteration has more elements
-
next
Description copied from interface:CloseableIteratorReturns the next element in the iteration.- Specified by:
nextin interfaceCloseableIterator<FilteredColumnarBatch>- Specified by:
nextin interfaceIterator<FilteredColumnarBatch>- Returns:
- the next element in the iteration
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
getNumberOfAddActions
public long getNumberOfAddActions()Number of add files in the final checkpoint. Should be called once the entire data of this iterator is consumed.- Returns:
- Number of add files in checkpoint.
-