Package io.delta.kernel.internal
Class InternalScanFileUtils
Object
io.delta.kernel.internal.InternalScanFileUtils
Utilities to extract information out of the scan file rows returned by
Scan.getScanFiles(Engine).-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final ColumnColumnexpression referring to the `partitionValues` in scan `add` file.static final intstatic final StructTypeSchema of the returned scan files.static final StructTypeSchema of the returned scan files whenScanImpl.getScanFiles(Engine, boolean)is called withincludeStats=true.static StructField -
Method Summary
Modifier and TypeMethodDescriptionstatic RowgenerateScanFileRow(FileStatus fileStatus) Create a scan file row conforming to the schemaSCAN_FILE_SCHEMAfor given file status.static FileStatusgetAddFileStatus(Row scanFileInfo) static DeletionVectorDescriptorgetDeletionVectorDescriptorFromRow(Row scanFile) Create aDeletionVectorDescriptorfromaddentry in the given scan file row.getPartitionValues(Row scanFileInfo) Get the partition columns and values belonging to theAddFilefrom given scan file row.static ColumngetPartitionValuesParsedRefInAddFile(String partitionColName) Get a references column for given partition column name in partitionValues_parsed column in scan file row.
-
Field Details
-
ADD_FILE_PARTITION_COL_REF
Columnexpression referring to the `partitionValues` in scan `add` file. -
TABLE_ROOT_STRUCT_FIELD
-
SCAN_FILE_SCHEMA
Schema of the returned scan files. May have an additional column "add.stats" at the end of the "add" columns that is not represented in the schema here. This column is conditionally read when a valid data skipping filter can be generated. -
SCAN_FILE_SCHEMA_WITH_STATS
Schema of the returned scan files whenScanImpl.getScanFiles(Engine, boolean)is called withincludeStats=true. -
ADD_FILE_ORDINAL
public static final int ADD_FILE_ORDINAL -
ADD_FILE_STATS_ORDINAL
public static final int ADD_FILE_STATS_ORDINAL
-
-
Method Details
-
getAddFileStatus
Get theFileStatusofAddFilefrom given scan fileRow. TheFileStatuscontains file metadata about the file.- Parameters:
scanFileInfo-Rowrepresenting one scan file.- Returns:
- a
FileStatusobject created from the given scan file row.
-
getPartitionValues
Get the partition columns and values belonging to theAddFilefrom given scan file row.- Parameters:
scanFileInfo-Rowrepresenting one scan file.- Returns:
- Map of partition column name to partition column value.
-
generateScanFileRow
Create a scan file row conforming to the schemaSCAN_FILE_SCHEMAfor given file status. This is used when creating the ScanFile row for reading commit or checkpoint files.- Parameters:
fileStatus-- Returns:
-
getDeletionVectorDescriptorFromRow
Create aDeletionVectorDescriptorfromaddentry in the given scan file row.- Parameters:
scanFile-Rowrepresenting one scan file.- Returns:
-
getPartitionValuesParsedRefInAddFile
Get a references column for given partition column name in partitionValues_parsed column in scan file row.- Parameters:
partitionColName- Partition column name- Returns:
Columnreference
-