class DeltaPurgeOperation extends DeltaReorgOperation with ReorgTableHelper
Reorg operation to purge files with soft deleted rows. This operation will also try finding and removing the dropped columns from parquet files, if ever exists such column that does not present in the current table schema.
- Alphabetic
- By Inheritance
- DeltaPurgeOperation
- ReorgTableHelper
- Serializable
- Serializable
- DeltaReorgOperation
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new DeltaPurgeOperation()
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
fileHasDifferentTypes(fileSchema: StructType, tablePhysicalSchema: StructType): Boolean
Determine whether
fileSchemahas any column that has a type that differs fromtablePhysicalSchema.Determine whether
fileSchemahas any column that has a type that differs fromtablePhysicalSchema.- fileSchema
the current parquet schema to be checked.
- tablePhysicalSchema
the current table schema.
- returns
whether the file has any column that has a different type from table column.
- Attributes
- protected
- Definition Classes
- ReorgTableHelper
-
def
fileHasExtraColumns(fileSchema: StructType, tablePhysicalSchema: StructType, protocol: Protocol, metadata: Metadata): Boolean
Determine whether
fileSchemahas any column that does not exist in thetablePhysicalSchema, this is possible by running ALTER TABLE commands, e.g., ALTER TABLE DROP COLUMN.Determine whether
fileSchemahas any column that does not exist in thetablePhysicalSchema, this is possible by running ALTER TABLE commands, e.g., ALTER TABLE DROP COLUMN.- fileSchema
the current parquet schema to be checked.
- tablePhysicalSchema
the current table schema.
- protocol
the protocol used to check
row_idandrow_commit_version.- metadata
the metadata used to check
row_idandrow_commit_version.- returns
whether the file has any dropped column.
- Attributes
- protected
- Definition Classes
- ReorgTableHelper
-
def
filterFilesToReorg(spark: SparkSession, snapshot: Snapshot, files: Seq[AddFile]): Seq[AddFile]
Collects files that need to be processed by the reorg operation from the list of candidate files.
Collects files that need to be processed by the reorg operation from the list of candidate files.
- Definition Classes
- DeltaPurgeOperation → DeltaReorgOperation
-
def
filterParquetFiles(files: Seq[AddFile], dataPath: Path, configuration: Configuration, ignoreCorruptFiles: Boolean, assumeBinaryIsString: Boolean, assumeInt96IsTimestamp: Boolean)(filterFileFn: (StructType) ⇒ Boolean): Seq[AddFile]
- Attributes
- protected
- Definition Classes
- ReorgTableHelper
-
def
filterParquetFilesOnExecutors(spark: SparkSession, files: Seq[AddFile], snapshot: Snapshot, ignoreCorruptFiles: Boolean)(filterFileFn: (StructType) ⇒ Boolean): Seq[AddFile]
Apply a filter on the list of AddFile to only keep the files that have physical parquet schema that satisfies the given filter function.
Apply a filter on the list of AddFile to only keep the files that have physical parquet schema that satisfies the given filter function.
Note: Filtering happens on the executors: **any variable captured by
filterFileFnmust be Serializable**- Attributes
- protected
- Definition Classes
- ReorgTableHelper
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()