package optimize
- Alphabetic
- Public
- All
Type Members
-
case class
AddFileWithNumRecords(addFile: AddFile, numPhysicalRecords: Long, numLogicalRecords: Long) extends Product with Serializable
Wrapper over an [AddFile] and its stats:
Wrapper over an [AddFile] and its stats:
- numPhysicalRecords
The number of records physically present in the file. Equivalent to
addFile.numTotalRecords.- numLogicalRecords
The physical number of records minus the Deletion Vector cardinality. Equivalent to
addFile.numRecords.
-
case class
AutoCompactParallelismStats(maxClusterUsedParallelism: Long = 0, minClusterUsedParallelism: Long = 0, maxSessionUsedParallelism: Long = 0, minSessionUsedParallelism: Long = 0) extends Product with Serializable
This statistics class keeps tracking the parallelism usage of Auto Compaction.
This statistics class keeps tracking the parallelism usage of Auto Compaction. It collects following metrics: -- the min/max parallelism among the whole cluster are used for Auto Compact, -- the min/max parallelism occupied by current Auto Compact session,
-
case class
DeletionVectorStats(numDeletionVectorsRemoved: Long = 0, numDeletionVectorRowsRemoved: Long = 0) extends Product with Serializable
Accumulator for statistics related with Deletion Vectors.
Accumulator for statistics related with Deletion Vectors. Note that this case class contains mutable variables and cannot be used in places where immutable case classes can be used (e.g. map/set keys).
-
case class
FileSizeMetrics(min: Option[Long], max: Option[Long], avg: Double, totalFiles: Long, totalSize: Long) extends Product with Serializable
Basic Stats on file sizes.
Basic Stats on file sizes.
- min
Minimum file size
- max
Maximum file size
- avg
Average of the file size
- totalFiles
Total number of files
- totalSize
Total size of the files
- case class FileSizeStats(minFileSize: Long = 0, maxFileSize: Long = 0, totalFiles: Long = 0, totalSize: Long = 0) extends Product with Serializable
-
case class
FileSizeStatsWithHistogram(min: Long, p25: Long, p50: Long, p75: Long, max: Long) extends Product with Serializable
Percentiles on the file sizes in this batch.
Percentiles on the file sizes in this batch.
- min
Size of the smallest file
- p25
Size of the 25th percentile file
- p50
Size of the 50th percentile file
- p75
Size of the 75th percentile file
- max
Size of the largest file
-
case class
OptimizeMetrics(numFilesAdded: Long, numFilesRemoved: Long, filesAdded: FileSizeMetrics = ..., filesRemoved: FileSizeMetrics = ..., partitionsOptimized: Long = 0, zOrderStats: Option[ZOrderStats] = None, clusteringStats: Option[ClusteringStats] = None, numBins: Long, numBatches: Long, totalConsideredFiles: Long, totalFilesSkipped: Long = 0, preserveInsertionOrder: Boolean = false, numFilesSkippedToReduceWriteAmplification: Long = 0, numBytesSkippedToReduceWriteAmplification: Long = 0, startTimeMs: Long = 0, endTimeMs: Long = 0, totalClusterParallelism: Long = 0, totalScheduledTasks: Long = 0, autoCompactParallelismStats: Option[ParallelismMetrics] = None, deletionVectorStats: Option[DeletionVectorStats] = None, numTableColumns: Long = 0, numTableColumnsWithStats: Long = 0) extends Product with Serializable
Metrics returned by the optimize command.
Metrics returned by the optimize command.
- numFilesAdded
number of files added by optimize
- numFilesRemoved
number of files removed by optimize
- filesAdded
Stats for the files added
- filesRemoved
Stats for the files removed
- partitionsOptimized
Number of partitions optimized
- zOrderStats
Z-Order stats
- clusteringStats
Clustering stats
- numBins
Number of bins
- numBatches
Number of batches
- totalConsideredFiles
Number of files considered for the Optimize operation.
- totalFilesSkipped
Number of files that are skipped from being Optimized.
- preserveInsertionOrder
If optimize was run with insertion preservation enabled.
- numFilesSkippedToReduceWriteAmplification
Number of files skipped for reducing write amplification.
- numBytesSkippedToReduceWriteAmplification
Number of bytes skipped for reducing write amplification.
- startTimeMs
The start time of Optimize command.
- endTimeMs
The end time of Optimize command.
- totalClusterParallelism
The total number of parallelism of this cluster.
- totalScheduledTasks
The total number of optimize task scheduled.
- autoCompactParallelismStats
The metrics of cluster and session parallelism.
- deletionVectorStats
Statistics related with Deletion Vectors.
- numTableColumns
Number of columns in the table.
- numTableColumnsWithStats
Number of table columns to collect data skipping stats.
-
case class
OptimizeStats(addedFilesSizeStats: FileSizeStats = FileSizeStats(), removedFilesSizeStats: FileSizeStats = FileSizeStats(), numPartitionsOptimized: Long = 0, zOrderStats: Option[ZOrderStats] = None, clusteringStats: Option[ClusteringStats] = None, numBins: Long = 0, numBatches: Long = 0, totalConsideredFiles: Long = 0, totalFilesSkipped: Long = 0, preserveInsertionOrder: Boolean = false, numFilesSkippedToReduceWriteAmplification: Long = 0, numBytesSkippedToReduceWriteAmplification: Long = 0, startTimeMs: Long = System.currentTimeMillis(), endTimeMs: Long = 0, totalClusterParallelism: Long = 0, totalScheduledTasks: Long = 0, deletionVectorStats: Option[DeletionVectorStats] = None, numTableColumns: Long = 0, numTableColumnsWithStats: Long = 0, autoCompactParallelismStats: AutoCompactParallelismStats = AutoCompactParallelismStats()) extends Product with Serializable
Stats for an OPTIMIZE operation accumulated across all batches.
-
case class
ParallelismMetrics(maxClusterActiveParallelism: Option[Long] = None, minClusterActiveParallelism: Option[Long] = None, maxSessionActiveParallelism: Option[Long] = None, minSessionActiveParallelism: Option[Long] = None) extends Product with Serializable
This statistics contains following metrics: -- the min/max parallelism among the whole cluster are used, -- the min/max parallelism occupied by current session,
-
class
ZCubeFileStatsCollector extends AnyRef
ZCube file statistics collector.
ZCube file statistics collector. An object of this class can be used to collect ZCube statistics. The file statistics collection can be started by initializing an object of this class and calling updateStats on every new file seen. The number of ZCubes, number of files from matching cubes and number of unoptimized files are captured here.
-
case class
ZOrderFileStats(num: Long, size: Long) extends Product with Serializable
Aggregated file stats for a category of ZCube files.
Aggregated file stats for a category of ZCube files.
- num
Total number of files.
- size
Total size of files in bytes.
-
class
ZOrderMetrics extends AnyRef
A class to create blob structure for zorder metrics and events.
-
case class
ZOrderStats(strategyName: String, inputCubeFiles: ZOrderFileStats, inputOtherFiles: ZOrderFileStats, inputNumCubes: Long, mergedFiles: ZOrderFileStats, numOutputCubes: Long, mergedNumCubes: Option[Long] = None) extends Product with Serializable
Aggregated stats for OPTIMIZE ZORDERBY command.
Aggregated stats for OPTIMIZE ZORDERBY command. This is a public facing API, consider any change carefully.
- strategyName
ZCubeMergeStrategy used.
- inputCubeFiles
Files in the ZCube matching the current OPTIMIZE operation.
- inputOtherFiles
Files not in any ZCube or in other ZCube orderings.
- inputNumCubes
Number of different cubes among input files.
- mergedFiles
Subset of input files merged by the current operation
- numOutputCubes
Number of output ZCubes written out
- mergedNumCubes
Number of different cubes among merged files.
Value Members
- object AddFileWithNumRecords extends Serializable
- object FileSizeStatsWithHistogram extends Serializable
- object ZOrderFileStats extends Serializable