package perf
Type Members
-
case class
DeltaOptimizedWriterExec(child: SparkPlan, partitionColumns: Seq[String], deltaLog: DeltaLog) extends SparkPlan with UnaryExecNode with DeltaLogging with Product with Serializable
An execution node which shuffles data to a target output of
DELTA_OPTIMIZE_WRITE_SHUFFLE_BLOCKSblocks, hash partitioned on the table partition columns.An execution node which shuffles data to a target output of
DELTA_OPTIMIZE_WRITE_SHUFFLE_BLOCKSblocks, hash partitioned on the table partition columns. We group all blocks by their reducer_id's and bin-pack intoDELTA_OPTIMIZE_WRITE_BIN_SIZEbins. Then we launch a Spark task per bin to write out a single file for each bin.- child
The execution plan
- partitionColumns
The partition columns of the table. Used for hash partitioning the write
- deltaLog
The DeltaLog for the table. Used for logging only
-
trait
OptimizeMetadataOnlyDeltaQuery extends LoggingShims
Optimize COUNT, MIN and MAX expressions on Delta tables.
Optimize COUNT, MIN and MAX expressions on Delta tables. This optimization is only applied when the following conditions are met: - The MIN/MAX columns are not nested and data type is supported by the optimization (ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, DateType). - All AddFiles in the Delta Log must have stats on columns used in MIN/MAX expressions, or the columns must be partitioned, in the latter case it uses partitionValues, a required field. - Table has no deletion vectors, or query has no MIN/MAX expressions. - COUNT has no DISTINCT. - Query has no filters. - Query has no GROUP BY. Example of valid query: SELECT COUNT(*), MIN(id), MAX(partition_col) FROM MyDeltaTable
-
class
OptimizedWriterBlocks extends AnyRef
A wrapper class to make the blocks non-serializable.
A wrapper class to make the blocks non-serializable. If we serialize the blocks and send them to the executors, it may cause memory problems. NOTE!!!: By wrapping the Array in a non-serializable class we enforce that the field needs to be transient, and gives us extra security against a developer making a mistake.