class CdcAddFileIndex extends TahoeBatchFileIndex
A TahoeFileIndex for scanning a sequence of added files as CDC. Similar to TahoeBatchFileIndex, with a bit of special handling to attach the log version and CDC type on a per-file basis.
- Alphabetic
- By Inheritance
- CdcAddFileIndex
- TahoeBatchFileIndex
- TahoeFileIndexWithSnapshotDescriptor
- TahoeFileIndex
- SnapshotDescriptor
- SupportsRowIndexFilters
- FileIndex
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
CdcAddFileIndex(spark: SparkSession, filesByVersion: Seq[CDCDataSpec[AddFile]], deltaLog: DeltaLog, path: Path, snapshot: SnapshotDescriptor, rowIndexFilters: Option[Map[String, RowIndexFilterType]] = None)
- spark
The Spark session.
- filesByVersion
Grouped FileActions, one per table version.
- deltaLog
The delta log instance.
- path
The table's data path.
- snapshot
The snapshot where we read CDC from.
- rowIndexFilters
Map from URI-encoded file path to a row index filter type. Note: Please also consider other CDC-related file indexes like TahoeChangeFileIndex and TahoeRemoveFileIndex when modifying this file index.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
absolutePath(child: String): Path
- Definition Classes
- TahoeFileIndex
-
val
actionType: String
- Definition Classes
- TahoeBatchFileIndex
-
val
addFiles: Seq[AddFile]
- Definition Classes
- TahoeBatchFileIndex
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
deltaLog: DeltaLog
- Definition Classes
- TahoeFileIndex → SnapshotDescriptor
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
fileStatusWithMetadataFromAddFile(addFile: AddFile): FileStatusWithMetadata
Generates a FileStatusWithMetadata using data extracted from a given AddFile.
Generates a FileStatusWithMetadata using data extracted from a given AddFile.
- Definition Classes
- TahoeFileIndex
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getBasePath(filePath: Path): Option[Path]
Returns the path of the base directory of the given file path (i.e.
Returns the path of the base directory of the given file path (i.e. its parent directory with all the partition directories stripped off).
- Definition Classes
- TahoeFileIndex
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getPartitionValuesRow(partitionValues: Map[String, String]): GenericInternalRow
- Attributes
- protected
- Definition Classes
- TahoeFileIndex
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
inputFiles: Array[String]
- Definition Classes
- CdcAddFileIndex → TahoeBatchFileIndex → FileIndex
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
listFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): Seq[PartitionDirectory]
- Definition Classes
- TahoeFileIndex → FileIndex
-
def
listPartitionsAsAddFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): (Seq[(InternalRow, Seq[AddFile])], Seq[AddFile])
Returns (i) tuples of partition directories to their respective AddFile actions and (ii) a collection of matched AddFiles.
Returns (i) tuples of partition directories to their respective AddFile actions and (ii) a collection of matched AddFiles. The matched AddFiles are those that meet the criteria set by the partition and data filters. Essentially, this is a collection of all the files associated with the identified partitions.
- Definition Classes
- TahoeFileIndex
-
def
makePartitionDirectories(partitionValuesToFiles: Seq[(InternalRow, Seq[AddFile])]): Seq[PartitionDirectory]
- Definition Classes
- TahoeFileIndex
-
def
matchingFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): Seq[AddFile]
Returns all matching/valid files by the given
partitionFiltersanddataFilters.Returns all matching/valid files by the given
partitionFiltersanddataFilters. Implementations may avoid evaluating data filters when doing so would be expensive, but *must* evaluate the partition filters; wrong results will be produced if AddFile entries which don't match the partition filters are returned.- Definition Classes
- CdcAddFileIndex → TahoeBatchFileIndex → TahoeFileIndex
-
def
metadata: Metadata
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
-
def
metadataOpsTimeNs: Option[Long]
- Definition Classes
- FileIndex
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
numOfFilesIfKnown: Option[Long]
- Attributes
- protected[delta]
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
-
val
partitionFiltersGenerated: Boolean
- Definition Classes
- TahoeBatchFileIndex
-
val
partitionSchema: StructType
- Definition Classes
- CdcAddFileIndex → TahoeFileIndex → FileIndex
-
val
path: Path
- Definition Classes
- TahoeFileIndex
-
def
protocol: Protocol
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
-
def
refresh(): Unit
- Definition Classes
- TahoeBatchFileIndex → FileIndex
-
def
rootPaths: Seq[Path]
- Definition Classes
- TahoeFileIndex → FileIndex
-
val
rowIndexFilters: Option[Map[String, RowIndexFilterType]]
If we know a-priori which exact rows we want to read (e.g., from a previous scan) find the per-file filter here, which must be passed down to the appropriate reader.
If we know a-priori which exact rows we want to read (e.g., from a previous scan) find the per-file filter here, which must be passed down to the appropriate reader.
- returns
a mapping from file names to the row index filter for that file.
- Definition Classes
- CdcAddFileIndex → SupportsRowIndexFilters
-
def
schema: StructType
- Definition Classes
- SnapshotDescriptor
-
lazy val
sizeInBytes: Long
- Definition Classes
- TahoeBatchFileIndex → FileIndex
-
def
sizeInBytesIfKnown: Option[Long]
- Attributes
- protected[delta]
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
-
val
snapshot: SnapshotDescriptor
- Definition Classes
- TahoeBatchFileIndex
-
val
spark: SparkSession
- Definition Classes
- TahoeFileIndex
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- TahoeFileIndex → FileIndex → AnyRef → Any
-
def
version: Long
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()