Package io.delta.kernel.internal.replay
Class LogReplay
Object
io.delta.kernel.internal.replay.LogReplay
Replays a history of actions, resolving them to produce the current state of the table. The
protocol for resolution is as follows:
- The most recent
AddFile and accompanying metadata for any `(path, dv id)` tuple wins.
- RemoveFile deletes a corresponding AddFile. A RemoveFile "corresponds" to
the AddFile that matches both the parquet file URI *and* the deletion vector's URI (if any).
- The most recent Metadata wins.
- The most recent Protocol version wins.
- For each `(path, dv id)` tuple, this class should always output only one FileAction
(either AddFile or RemoveFile)
This class exposes the following public APIs
- getProtocol(): latest non-null Protocol
- getMetadata(): latest non-null Metadata
- getAddFilesAsColumnarBatches(io.delta.kernel.engine.Engine, boolean, java.util.Optional<io.delta.kernel.expressions.Predicate>): return all active (not tombstoned) AddFiles as
ColumnarBatchs-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic intstatic intstatic intstatic Stringstatic final StructTypeRead schema when searching for the latest Protocol and Metadata.static intstatic intstatic intstatic Stringstatic final StructTypeRead schema when searching for just the transaction identifiersstatic String -
Constructor Summary
ConstructorsConstructorDescriptionLogReplay(Path logPath, Path dataPath, long snapshotVersion, Engine engine, LogSegment logSegment, Optional<SnapshotHint> snapshotHint) -
Method Summary
Modifier and TypeMethodDescriptionstatic booleangetAddFilesAsColumnarBatches(Engine engine, boolean shouldReadStats, Optional<Predicate> checkpointPredicate) Returns an iterator ofFilteredColumnarBatchrepresenting all the active AddFiles in the table.static StructTypegetAddRemoveReadSchema(boolean shouldReadStats) Read schema when searching for all the active AddFilesgetLatestTransactionIdentifier(Engine engine, String applicationId) static StructTypewithSidecarFileSchema(StructType schema)
-
Field Details
-
PROTOCOL_METADATA_READ_SCHEMA
Read schema when searching for the latest Protocol and Metadata. -
SET_TRANSACTION_READ_SCHEMA
Read schema when searching for just the transaction identifiers -
SIDECAR_FIELD_NAME
-
ADDFILE_FIELD_NAME
-
REMOVEFILE_FIELD_NAME
-
ADD_FILE_ORDINAL
public static int ADD_FILE_ORDINAL -
ADD_FILE_PATH_ORDINAL
public static int ADD_FILE_PATH_ORDINAL -
ADD_FILE_DV_ORDINAL
public static int ADD_FILE_DV_ORDINAL -
REMOVE_FILE_ORDINAL
public static int REMOVE_FILE_ORDINAL -
REMOVE_FILE_PATH_ORDINAL
public static int REMOVE_FILE_PATH_ORDINAL -
REMOVE_FILE_DV_ORDINAL
public static int REMOVE_FILE_DV_ORDINAL
-
-
Constructor Details
-
LogReplay
public LogReplay(Path logPath, Path dataPath, long snapshotVersion, Engine engine, LogSegment logSegment, Optional<SnapshotHint> snapshotHint)
-
-
Method Details
-
withSidecarFileSchema
-
containsAddOrRemoveFileActions
-
getAddRemoveReadSchema
Read schema when searching for all the active AddFiles -
getProtocol
-
getMetadata
-
getLatestTransactionIdentifier
-
getAddFilesAsColumnarBatches
public CloseableIterator<FilteredColumnarBatch> getAddFilesAsColumnarBatches(Engine engine, boolean shouldReadStats, Optional<Predicate> checkpointPredicate) Returns an iterator ofFilteredColumnarBatchrepresenting all the active AddFiles in the table.Statistics are conditionally read for the AddFiles based on
shouldReadStats. The returned batches have schema:-
name:
addtype:
AddFile.SCHEMA_WITH_STATSifshouldReadStats=true, otherwiseAddFile.SCHEMA_WITHOUT_STATS
-
name:
-