| Modifier and Type | Method and Description |
|---|---|
java.util.Optional<Predicate> |
getRemainingFilter()
Get the remaining filter that is not guaranteed to be satisfied for the data Delta Kernel
returns.
|
CloseableIterator<FilteredColumnarBatch> |
getScanFiles(Engine engine)
Get an iterator of data files to scan.
|
Row |
getScanState(Engine engine)
Get the scan state associated with the current scan.
|
static CloseableIterator<FilteredColumnarBatch> |
transformPhysicalData(Engine engine,
Row scanState,
Row scanFile,
CloseableIterator<ColumnarBatch> physicalDataIter)
Transform the physical data read from the table data file into the logical data that expected
out of the Delta table.
|
CloseableIterator<FilteredColumnarBatch> getScanFiles(Engine engine)
engine - Engine instance to use in Delta Kernel.FilteredColumnarBatchs where each selected row in the batch
corresponds to one scan file. Schema of each row is defined as follows:
add, type: struct
path, type: string, description: location of the
file. The path is a URI as specified by RFC 2396 URI Generic Syntax,
which needs to be decoded to get the data file path.
partitionValues, type: map(string, string),
description: A map from partition column to value for this logical file.
size, type: long, description: size of the file.
modificationTime, type: log, description: the time
this logical file was created, as milliseconds since the epoch.
dataChange, type: boolean, description: When false
the logical file must already be present in the table or the records in
the added file must be contained in one or more remove actions in the
same version
deletionVector, type: string, description: Either
null (or absent in JSON) when no DV is associated with this data file, or
a struct (described below) that contains necessary information about the
DV that is part of this logical file. For description of each member
variable in `deletionVector` @see
Protocol
storageType, type: string
pathOrInlineDv, type: string, description:
The path is a URI as specified by RFC 2396 URI Generic Syntax,
which needs to be decoded to get the data file path.
offset, type: log
sizeInBytes, type: log
cardinality, type: log
tags, type: map(string, string), description: Map
containing metadata about the scan file.
tableRoot, type: string
java.util.Optional<Predicate> getRemainingFilter()
Predicate.Row getScanState(Engine engine)
static CloseableIterator<FilteredColumnarBatch> transformPhysicalData(Engine engine, Row scanState, Row scanFile, CloseableIterator<ColumnarBatch> physicalDataIter) throws java.io.IOException
engine - Connector provided Engine implementation.scanState - Scan state returned by getScanState(Engine)scanFile - Scan file from where the physical data physicalDataIter is read from.physicalDataIter - Iterator of ColumnarBatchs containing the physical data read
from the scanFile.FilteredColumnarBatchs.
Each FilteredColumnarBatch instance contains the data read and an optional
selection vector that indicates data rows as valid or invalid. It is the responsibility of
the caller to close this iterator.java.io.IOException - when error occurs while reading the data.