Packages

package perf

Type Members

  1. case class DeltaOptimizedWriterExec(child: SparkPlan, partitionColumns: Seq[String], deltaLog: DeltaLog) extends SparkPlan with UnaryExecNode with DeltaLogging with Product with Serializable

    An execution node which shuffles data to a target output of DELTA_OPTIMIZE_WRITE_SHUFFLE_BLOCKS blocks, hash partitioned on the table partition columns.

    An execution node which shuffles data to a target output of DELTA_OPTIMIZE_WRITE_SHUFFLE_BLOCKS blocks, hash partitioned on the table partition columns. We group all blocks by their reducer_id's and bin-pack into DELTA_OPTIMIZE_WRITE_BIN_SIZE bins. Then we launch a Spark task per bin to write out a single file for each bin.

    child

    The execution plan

    partitionColumns

    The partition columns of the table. Used for hash partitioning the write

    deltaLog

    The DeltaLog for the table. Used for logging only

  2. trait OptimizeMetadataOnlyDeltaQuery extends LoggingShims

    Optimize COUNT, MIN and MAX expressions on Delta tables.

    Optimize COUNT, MIN and MAX expressions on Delta tables. This optimization is only applied when the following conditions are met: - The MIN/MAX columns are not nested and data type is supported by the optimization (ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, DateType). - All AddFiles in the Delta Log must have stats on columns used in MIN/MAX expressions, or the columns must be partitioned, in the latter case it uses partitionValues, a required field. - Table has no deletion vectors, or query has no MIN/MAX expressions. - COUNT has no DISTINCT. - Query has no filters. - Query has no GROUP BY. Example of valid query: SELECT COUNT(*), MIN(id), MAX(partition_col) FROM MyDeltaTable

  3. class OptimizedWriterBlocks extends AnyRef

    A wrapper class to make the blocks non-serializable.

    A wrapper class to make the blocks non-serializable. If we serialize the blocks and send them to the executors, it may cause memory problems. NOTE!!!: By wrapping the Array in a non-serializable class we enforce that the field needs to be transient, and gives us extra security against a developer making a mistake.

Ungrouped