public class LogReplay
extends Object
AddFile and accompanying metadata for any `(path, dv id)` tuple wins.
- RemoveFile deletes a corresponding AddFile. A RemoveFile "corresponds" to
the AddFile that matches both the parquet file URI *and* the deletion vector's URI (if any).
- The most recent Metadata wins.
- The most recent Protocol version wins.
- For each `(path, dv id)` tuple, this class should always output only one FileAction
(either AddFile or RemoveFile)
This class exposes the following public APIs
- getProtocol(): latest non-null Protocol
- getMetadata(): latest non-null Metadata
- getAddFilesAsColumnarBatches(boolean): return all active (not tombstoned) AddFiles as
ColumnarBatchs| Modifier and Type | Field and Description |
|---|---|
static int |
ADD_FILE_DV_ORDINAL |
static int |
ADD_FILE_ORDINAL |
static int |
ADD_FILE_PATH_ORDINAL |
static StructType |
PROTOCOL_METADATA_READ_SCHEMA
Read schema when searching for the latest Protocol and Metadata.
|
static int |
REMOVE_FILE_DV_ORDINAL |
static int |
REMOVE_FILE_ORDINAL |
static int |
REMOVE_FILE_PATH_ORDINAL |
static StructType |
SET_TRANSACTION_READ_SCHEMA
Read schema when searching for just the transaction identifiers
|
| Constructor and Description |
|---|
LogReplay(Path logPath,
Path dataPath,
long snapshotVersion,
TableClient tableClient,
LogSegment logSegment,
java.util.Optional<SnapshotHint> snapshotHint) |
| Modifier and Type | Method and Description |
|---|---|
CloseableIterator<FilteredColumnarBatch> |
getAddFilesAsColumnarBatches(boolean shouldReadStats)
Returns an iterator of
FilteredColumnarBatch representing all the active AddFiles
in the table. |
static StructType |
getAddRemoveReadSchema(boolean shouldReadStats)
Read schema when searching for all the active AddFiles
|
java.util.Optional<Long> |
getLatestTransactionIdentifier(String applicationId) |
Metadata |
getMetadata() |
Protocol |
getProtocol() |
public static final StructType PROTOCOL_METADATA_READ_SCHEMA
public static final StructType SET_TRANSACTION_READ_SCHEMA
public static int ADD_FILE_ORDINAL
public static int ADD_FILE_PATH_ORDINAL
public static int ADD_FILE_DV_ORDINAL
public static int REMOVE_FILE_ORDINAL
public static int REMOVE_FILE_PATH_ORDINAL
public static int REMOVE_FILE_DV_ORDINAL
public LogReplay(Path logPath, Path dataPath, long snapshotVersion, TableClient tableClient, LogSegment logSegment, java.util.Optional<SnapshotHint> snapshotHint)
public static StructType getAddRemoveReadSchema(boolean shouldReadStats)
public Protocol getProtocol()
public Metadata getMetadata()
public java.util.Optional<Long> getLatestTransactionIdentifier(String applicationId)
public CloseableIterator<FilteredColumnarBatch> getAddFilesAsColumnarBatches(boolean shouldReadStats)
FilteredColumnarBatch representing all the active AddFiles
in the table.
Statistics are conditionally read for the AddFiles based on shouldReadStats. The
returned batches have schema:
add
type: AddFile.SCHEMA_WITH_STATS if shouldReadStats=true, otherwise
AddFile.SCHEMA_WITHOUT_STATS