| Modifier and Type | Method and Description |
|---|---|
java.util.Optional<Predicate> |
getRemainingFilter()
Get the remaining filter that is not guaranteed to be satisfied for the data Delta Kernel
returns.
|
CloseableIterator<FilteredColumnarBatch> |
getScanFiles(TableClient tableClient)
Get an iterator of data files to scan.
|
Row |
getScanState(TableClient tableClient)
Get the scan state associated with the current scan.
|
static CloseableIterator<FilteredColumnarBatch> |
transformPhysicalData(TableClient tableClient,
Row scanState,
Row scanFile,
CloseableIterator<ColumnarBatch> physicalDataIter)
Transform the physical data read from the table data file into the logical data that expected
out of the Delta table.
|
CloseableIterator<FilteredColumnarBatch> getScanFiles(TableClient tableClient)
tableClient - TableClient instance to use in Delta Kernel.FilteredColumnarBatchs where each selected row in
the batch corresponds to one scan file. Schema of each row is defined as follows:
add, type: structpath, type: string, description: location of the file.partitionValues, type: map(string, string),
description: A map from partition column to value for this logical file. size, type: log, description: size of the file.modificationTime, type: log, description: the time this
logical file was created, as milliseconds since the epoch.dataChange, type: boolean, description: When false the
logical file must already be present in the table or the records in the added file
must be contained in one or more remove actions in the same versiondeletionVector, type: string, description: Either null
(or absent in JSON) when no DV is associated with this data file, or a struct
(described below) that contains necessary information about the DV that is part of
this logical file. For description of each member variable in `deletionVector` @see
ProtocolstorageType, type: stringpathOrInlineDv, type: stringoffset, type: logsizeInBytes, type: logcardinality, type: logjava.util.Optional<Predicate> getRemainingFilter()
Predicate.Row getScanState(TableClient tableClient)
tableClient - TableClient instance to use in Delta Kernel.Row format.static CloseableIterator<FilteredColumnarBatch> transformPhysicalData(TableClient tableClient, Row scanState, Row scanFile, CloseableIterator<ColumnarBatch> physicalDataIter) throws java.io.IOException
tableClient - Connector provided TableClient implementation.scanState - Scan state returned by getScanState(TableClient)scanFile - Scan file from where the physical data physicalDataIter is
read from.physicalDataIter - Iterator of ColumnarBatchs containing the physical data read
from the scanFile.FilteredColumnarBatchs.
Each FilteredColumnarBatch instance contains the data read and an optional selection
vector that indicates data rows as valid or invalid. It is the responsibility of the
caller to close this iterator.java.io.IOException - when error occurs while reading the data.