class CdcAddFileIndex extends TahoeBatchFileIndex
A TahoeFileIndex for scanning a sequence of added files as CDC. Similar to TahoeBatchFileIndex, with a bit of special handling to attach the log version and CDC type on a per-file basis.
- Alphabetic
- By Inheritance
- CdcAddFileIndex
- TahoeBatchFileIndex
- TahoeFileIndexWithSnapshotDescriptor
- TahoeFileIndex
- SnapshotDescriptor
- SupportsRowIndexFilters
- FileIndex
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new CdcAddFileIndex(spark: SparkSession, filesByVersion: Seq[CDCDataSpec[AddFile]], deltaLog: DeltaLog, path: Path, snapshot: SnapshotDescriptor, rowIndexFilters: Option[Map[String, RowIndexFilterType]] = None)
- spark
The Spark session.
- filesByVersion
Grouped FileActions, one per table version.
- deltaLog
The delta log instance.
- path
The table's data path.
- snapshot
The snapshot where we read CDC from.
- rowIndexFilters
Map from URI-encoded file path to a row index filter type. Note: Please also consider other CDC-related file indexes like TahoeChangeFileIndex and TahoeRemoveFileIndex when modifying this file index.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def absolutePath(child: String): Path
- Definition Classes
- TahoeFileIndex
- val actionType: String
- Definition Classes
- TahoeBatchFileIndex
- val addFiles: Seq[AddFile]
- Definition Classes
- TahoeBatchFileIndex
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- val deltaLog: DeltaLog
- Definition Classes
- TahoeFileIndex → SnapshotDescriptor
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def fileStatusWithMetadataFromAddFile(addFile: AddFile): FileStatusWithMetadata
Generates a FileStatusWithMetadata using data extracted from a given AddFile.
Generates a FileStatusWithMetadata using data extracted from a given AddFile.
- Definition Classes
- TahoeFileIndex
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def getBasePath(filePath: Path): Option[Path]
Returns the path of the base directory of the given file path (i.e.
Returns the path of the base directory of the given file path (i.e. its parent directory with all the partition directories stripped off).
- Definition Classes
- TahoeFileIndex
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getPartitionValuesRow(partitionValues: Map[String, String]): GenericInternalRow
- Attributes
- protected
- Definition Classes
- TahoeFileIndex
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def inputFiles: Array[String]
- Definition Classes
- CdcAddFileIndex → TahoeBatchFileIndex → FileIndex
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def listFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): Seq[PartitionDirectory]
- Definition Classes
- TahoeFileIndex → FileIndex
- def listPartitionsAsAddFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): (Seq[(InternalRow, Seq[AddFile])], Seq[AddFile])
Returns (i) tuples of partition directories to their respective AddFile actions and (ii) a collection of matched AddFiles.
Returns (i) tuples of partition directories to their respective AddFile actions and (ii) a collection of matched AddFiles. The matched AddFiles are those that meet the criteria set by the partition and data filters. Essentially, this is a collection of all the files associated with the identified partitions.
- Definition Classes
- TahoeFileIndex
- def makePartitionDirectories(partitionValuesToFiles: Seq[(InternalRow, Seq[AddFile])]): Seq[PartitionDirectory]
- Definition Classes
- TahoeFileIndex
- def matchingFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): Seq[AddFile]
Returns all matching/valid files by the given
partitionFiltersanddataFilters.Returns all matching/valid files by the given
partitionFiltersanddataFilters. Implementations may avoid evaluating data filters when doing so would be expensive, but *must* evaluate the partition filters; wrong results will be produced if AddFile entries which don't match the partition filters are returned.- Definition Classes
- CdcAddFileIndex → TahoeBatchFileIndex → TahoeFileIndex
- def metadata: Metadata
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
- def metadataOpsTimeNs: Option[Long]
- Definition Classes
- FileIndex
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def numOfFilesIfKnown: Option[Long]
- Attributes
- protected[delta]
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
- val partitionFiltersGenerated: Boolean
- Definition Classes
- TahoeBatchFileIndex
- val partitionSchema: StructType
- Definition Classes
- CdcAddFileIndex → TahoeFileIndex → FileIndex
- val path: Path
- Definition Classes
- TahoeFileIndex
- def protocol: Protocol
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
- def refresh(): Unit
- Definition Classes
- TahoeBatchFileIndex → FileIndex
- def rootPaths: Seq[Path]
- Definition Classes
- TahoeFileIndex → FileIndex
- val rowIndexFilters: Option[Map[String, RowIndexFilterType]]
If we know a-priori which exact rows we want to read (e.g., from a previous scan) find the per-file filter here, which must be passed down to the appropriate reader.
If we know a-priori which exact rows we want to read (e.g., from a previous scan) find the per-file filter here, which must be passed down to the appropriate reader.
- returns
a mapping from file names to the row index filter for that file.
- Definition Classes
- CdcAddFileIndex → SupportsRowIndexFilters
- def schema: StructType
- Definition Classes
- SnapshotDescriptor
- lazy val sizeInBytes: Long
- Definition Classes
- TahoeBatchFileIndex → FileIndex
- def sizeInBytesIfKnown: Option[Long]
- Attributes
- protected[delta]
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
- val snapshot: SnapshotDescriptor
- Definition Classes
- TahoeBatchFileIndex
- val spark: SparkSession
- Definition Classes
- TahoeFileIndex
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- TahoeFileIndex → FileIndex → AnyRef → Any
- def version: Long
- Definition Classes
- TahoeFileIndexWithSnapshotDescriptor → SnapshotDescriptor
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()