public class LogReplay
extends Object
AddFile and accompanying metadata for any `(path, dv id)` tuple wins.
- RemoveFile deletes a corresponding AddFile. A RemoveFile "corresponds" to
the AddFile that matches both the parquet file URI *and* the deletion vector's URI (if any).
- The most recent Metadata wins.
- The most recent Protocol version wins.
- For each `(path, dv id)` tuple, this class should always output only one FileAction
(either AddFile or RemoveFile)
This class exposes the following public APIs
- getProtocol(): latest non-null Protocol
- getMetadata(): latest non-null Metadata
- getAddFilesAsColumnarBatches(boolean, java.util.Optional<io.delta.kernel.expressions.Predicate>): return all active (not tombstoned) AddFiles as
ColumnarBatchs| Modifier and Type | Field and Description |
|---|---|
static int |
ADD_FILE_DV_ORDINAL |
static int |
ADD_FILE_ORDINAL |
static int |
ADD_FILE_PATH_ORDINAL |
static String |
ADDFILE_FIELD_NAME |
static StructType |
PROTOCOL_METADATA_READ_SCHEMA
Read schema when searching for the latest Protocol and Metadata.
|
static int |
REMOVE_FILE_DV_ORDINAL |
static int |
REMOVE_FILE_ORDINAL |
static int |
REMOVE_FILE_PATH_ORDINAL |
static String |
REMOVEFILE_FIELD_NAME |
static StructType |
SET_TRANSACTION_READ_SCHEMA
Read schema when searching for just the transaction identifiers
|
static String |
SIDECAR_FIELD_NAME |
| Constructor and Description |
|---|
LogReplay(Path logPath,
Path dataPath,
long snapshotVersion,
Engine engine,
LogSegment logSegment,
java.util.Optional<SnapshotHint> snapshotHint) |
| Modifier and Type | Method and Description |
|---|---|
static boolean |
containsAddOrRemoveFileActions(StructType schema) |
CloseableIterator<FilteredColumnarBatch> |
getAddFilesAsColumnarBatches(boolean shouldReadStats,
java.util.Optional<Predicate> checkpointPredicate)
Returns an iterator of
FilteredColumnarBatch representing all the active AddFiles
in the table. |
static StructType |
getAddRemoveReadSchema(boolean shouldReadStats)
Read schema when searching for all the active AddFiles
|
java.util.Optional<Long> |
getLatestTransactionIdentifier(String applicationId) |
Metadata |
getMetadata() |
Protocol |
getProtocol() |
static StructType |
withSidecarFileSchema(StructType schema) |
public static final StructType PROTOCOL_METADATA_READ_SCHEMA
public static final StructType SET_TRANSACTION_READ_SCHEMA
public static String SIDECAR_FIELD_NAME
public static String ADDFILE_FIELD_NAME
public static String REMOVEFILE_FIELD_NAME
public static int ADD_FILE_ORDINAL
public static int ADD_FILE_PATH_ORDINAL
public static int ADD_FILE_DV_ORDINAL
public static int REMOVE_FILE_ORDINAL
public static int REMOVE_FILE_PATH_ORDINAL
public static int REMOVE_FILE_DV_ORDINAL
public LogReplay(Path logPath, Path dataPath, long snapshotVersion, Engine engine, LogSegment logSegment, java.util.Optional<SnapshotHint> snapshotHint)
public static StructType withSidecarFileSchema(StructType schema)
public static boolean containsAddOrRemoveFileActions(StructType schema)
public static StructType getAddRemoveReadSchema(boolean shouldReadStats)
public Protocol getProtocol()
public Metadata getMetadata()
public java.util.Optional<Long> getLatestTransactionIdentifier(String applicationId)
public CloseableIterator<FilteredColumnarBatch> getAddFilesAsColumnarBatches(boolean shouldReadStats, java.util.Optional<Predicate> checkpointPredicate)
FilteredColumnarBatch representing all the active AddFiles
in the table.
Statistics are conditionally read for the AddFiles based on shouldReadStats. The
returned batches have schema:
add
type: AddFile.SCHEMA_WITH_STATS if shouldReadStats=true, otherwise
AddFile.SCHEMA_WITHOUT_STATS