Packages

package optimize

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class AddFileWithNumRecords(addFile: AddFile, numPhysicalRecords: Long, numLogicalRecords: Long) extends Product with Serializable

    Wrapper over an [AddFile] and its stats:

    Wrapper over an [AddFile] and its stats:

    numPhysicalRecords

    The number of records physically present in the file. Equivalent to addFile.numTotalRecords.

    numLogicalRecords

    The physical number of records minus the Deletion Vector cardinality. Equivalent to addFile.numRecords.

  2. case class AutoCompactParallelismStats(maxClusterUsedParallelism: Long = 0, minClusterUsedParallelism: Long = 0, maxSessionUsedParallelism: Long = 0, minSessionUsedParallelism: Long = 0) extends Product with Serializable

    This statistics class keeps tracking the parallelism usage of Auto Compaction.

    This statistics class keeps tracking the parallelism usage of Auto Compaction. It collects following metrics: -- the min/max parallelism among the whole cluster are used for Auto Compact, -- the min/max parallelism occupied by current Auto Compact session,

  3. case class DeletionVectorStats(numDeletionVectorsRemoved: Long = 0, numDeletionVectorRowsRemoved: Long = 0) extends Product with Serializable

    Accumulator for statistics related with Deletion Vectors.

    Accumulator for statistics related with Deletion Vectors. Note that this case class contains mutable variables and cannot be used in places where immutable case classes can be used (e.g. map/set keys).

  4. case class FileSizeMetrics(min: Option[Long], max: Option[Long], avg: Double, totalFiles: Long, totalSize: Long) extends Product with Serializable

    Basic Stats on file sizes.

    Basic Stats on file sizes.

    min

    Minimum file size

    max

    Maximum file size

    avg

    Average of the file size

    totalFiles

    Total number of files

    totalSize

    Total size of the files

  5. case class FileSizeStats(minFileSize: Long = 0, maxFileSize: Long = 0, totalFiles: Long = 0, totalSize: Long = 0) extends Product with Serializable
  6. case class FileSizeStatsWithHistogram(min: Long, p25: Long, p50: Long, p75: Long, max: Long) extends Product with Serializable

    Percentiles on the file sizes in this batch.

    Percentiles on the file sizes in this batch.

    min

    Size of the smallest file

    p25

    Size of the 25th percentile file

    p50

    Size of the 50th percentile file

    p75

    Size of the 75th percentile file

    max

    Size of the largest file

  7. case class OptimizeMetrics(numFilesAdded: Long, numFilesRemoved: Long, filesAdded: FileSizeMetrics = ..., filesRemoved: FileSizeMetrics = ..., partitionsOptimized: Long = 0, zOrderStats: Option[ZOrderStats] = None, clusteringStats: Option[ClusteringStats] = None, numBins: Long, numBatches: Long, totalConsideredFiles: Long, totalFilesSkipped: Long = 0, preserveInsertionOrder: Boolean = false, numFilesSkippedToReduceWriteAmplification: Long = 0, numBytesSkippedToReduceWriteAmplification: Long = 0, startTimeMs: Long = 0, endTimeMs: Long = 0, totalClusterParallelism: Long = 0, totalScheduledTasks: Long = 0, autoCompactParallelismStats: Option[ParallelismMetrics] = None, deletionVectorStats: Option[DeletionVectorStats] = None, numTableColumns: Long = 0, numTableColumnsWithStats: Long = 0) extends Product with Serializable

    Metrics returned by the optimize command.

    Metrics returned by the optimize command.

    numFilesAdded

    number of files added by optimize

    numFilesRemoved

    number of files removed by optimize

    filesAdded

    Stats for the files added

    filesRemoved

    Stats for the files removed

    partitionsOptimized

    Number of partitions optimized

    zOrderStats

    Z-Order stats

    clusteringStats

    Clustering stats

    numBins

    Number of bins

    numBatches

    Number of batches

    totalConsideredFiles

    Number of files considered for the Optimize operation.

    totalFilesSkipped

    Number of files that are skipped from being Optimized.

    preserveInsertionOrder

    If optimize was run with insertion preservation enabled.

    numFilesSkippedToReduceWriteAmplification

    Number of files skipped for reducing write amplification.

    numBytesSkippedToReduceWriteAmplification

    Number of bytes skipped for reducing write amplification.

    startTimeMs

    The start time of Optimize command.

    endTimeMs

    The end time of Optimize command.

    totalClusterParallelism

    The total number of parallelism of this cluster.

    totalScheduledTasks

    The total number of optimize task scheduled.

    autoCompactParallelismStats

    The metrics of cluster and session parallelism.

    deletionVectorStats

    Statistics related with Deletion Vectors.

    numTableColumns

    Number of columns in the table.

    numTableColumnsWithStats

    Number of table columns to collect data skipping stats.

  8. case class OptimizeStats(addedFilesSizeStats: FileSizeStats = FileSizeStats(), removedFilesSizeStats: FileSizeStats = FileSizeStats(), numPartitionsOptimized: Long = 0, zOrderStats: Option[ZOrderStats] = None, clusteringStats: Option[ClusteringStats] = None, numBins: Long = 0, numBatches: Long = 0, totalConsideredFiles: Long = 0, totalFilesSkipped: Long = 0, preserveInsertionOrder: Boolean = false, numFilesSkippedToReduceWriteAmplification: Long = 0, numBytesSkippedToReduceWriteAmplification: Long = 0, startTimeMs: Long = System.currentTimeMillis(), endTimeMs: Long = 0, totalClusterParallelism: Long = 0, totalScheduledTasks: Long = 0, deletionVectorStats: Option[DeletionVectorStats] = None, numTableColumns: Long = 0, numTableColumnsWithStats: Long = 0, autoCompactParallelismStats: AutoCompactParallelismStats = AutoCompactParallelismStats()) extends Product with Serializable

    Stats for an OPTIMIZE operation accumulated across all batches.

  9. case class ParallelismMetrics(maxClusterActiveParallelism: Option[Long] = None, minClusterActiveParallelism: Option[Long] = None, maxSessionActiveParallelism: Option[Long] = None, minSessionActiveParallelism: Option[Long] = None) extends Product with Serializable

    This statistics contains following metrics: -- the min/max parallelism among the whole cluster are used, -- the min/max parallelism occupied by current session,

  10. class ZCubeFileStatsCollector extends AnyRef

    ZCube file statistics collector.

    ZCube file statistics collector. An object of this class can be used to collect ZCube statistics. The file statistics collection can be started by initializing an object of this class and calling updateStats on every new file seen. The number of ZCubes, number of files from matching cubes and number of unoptimized files are captured here.

  11. case class ZOrderFileStats(num: Long, size: Long) extends Product with Serializable

    Aggregated file stats for a category of ZCube files.

    Aggregated file stats for a category of ZCube files.

    num

    Total number of files.

    size

    Total size of files in bytes.

  12. class ZOrderMetrics extends AnyRef

    A class to create blob structure for zorder metrics and events.

  13. case class ZOrderStats(strategyName: String, inputCubeFiles: ZOrderFileStats, inputOtherFiles: ZOrderFileStats, inputNumCubes: Long, mergedFiles: ZOrderFileStats, numOutputCubes: Long, mergedNumCubes: Option[Long] = None) extends Product with Serializable

    Aggregated stats for OPTIMIZE ZORDERBY command.

    Aggregated stats for OPTIMIZE ZORDERBY command. This is a public facing API, consider any change carefully.

    strategyName

    ZCubeMergeStrategy used.

    inputCubeFiles

    Files in the ZCube matching the current OPTIMIZE operation.

    inputOtherFiles

    Files not in any ZCube or in other ZCube orderings.

    inputNumCubes

    Number of different cubes among input files.

    mergedFiles

    Subset of input files merged by the current operation

    numOutputCubes

    Number of output ZCubes written out

    mergedNumCubes

    Number of different cubes among merged files.

Value Members

  1. object AddFileWithNumRecords extends Serializable
  2. object FileSizeStatsWithHistogram extends Serializable
  3. object ZOrderFileStats extends Serializable

Ungrouped