org.apache.spark.sql.delta.stats
AutoCompactPartitionStats
Companion object AutoCompactPartitionStats
class AutoCompactPartitionStats extends AnyRef
This singleton object collect the table partition statistic for each commit that creates
AddFile or RemoveFile objects.
To control the memory usage, there are maxNumTablePartitions per table and 'maxNumPartitions'
partition entries across all tables.
Note:
- Since the partition of each table is limited, if this limitation is reached, the least recently used table partitions will be evicted. 2. If all 'maxNumPartitions' are occupied, the partition stats of least recently used tables will be evicted until the used partitions fall back below to 'maxNumPartitions'. 3. The un-partitioned tables are treated as tables with single partition.
- Alphabetic
- By Inheritance
- AutoCompactPartitionStats
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new AutoCompactPartitionStats(maxNumTablePartitions: Int, maxNumPartitions: Int)
- maxNumTablePartitions
The hash space of partition key to reduce memory usage per table.
- maxNumPartitions
The maximum number of partition that can be occupied.
Type Members
- type PartitionFilesMap = LinkedHashMap[Int, Long]
- type PartitionKey = Map[String, String]
- type PartitionKeySet = Set[Map[String, String]]
- class PartitionStat extends AnyRef
This class to store the states of one table partition.
This class to store the states of one table partition. These state includes: -- the number of small files, -- the thread that assigned to compact this partition, and -- whether the partition was compacted.
Note: Since this class keeps tracking of the statistics of the table partition and the state of the auto compaction thread that works on the table partition, any method that accesses any attribute of this class needs to be protected by synchronized context.
- type TablePartitionStats = LinkedHashMap[Int, PartitionStat]
This hashtable is used to store all table partition states of a table, the key is the hashcode of the partition, the value is PartitionStat object.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def collectPartitionStats(collector: AutoCompactPartitionStatsCollector, tableId: String, actions: Iterator[Action]): Unit
Collect the number of files, which are less than minFileSize, added to or removed from each partition from
actions. - def createStatsCollector(minFileSize: Long, errorReporter: (Throwable) => Unit): AutoCompactPartitionStatsCollector
Helper class used to keep state regarding tracking auto-compaction stats of AddFile and RemoveFile actions in a single run that are greater than a passed-in minimum file size.
Helper class used to keep state regarding tracking auto-compaction stats of AddFile and RemoveFile actions in a single run that are greater than a passed-in minimum file size. If the collector runs into any non-fatal errors, it will invoke the error reporter on the error and then skip further execution.
- minFileSize
Minimum file size for files we track auto-compact stats
- errorReporter
Function that reports the first error, if any
- returns
A collector object that tracks the Add/Remove file actions of the current commit.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def filterPartitionsWithSmallFiles(tableId: String, targetPartitions: Set[PartitionKey], minNumFiles: Long): Set[PartitionKey]
- returns
Filter partitions from targetPartitions that have not been auto-compacted or that have enough small files.
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def markPartitionsAsCompacted(tableId: String, compactedPartitions: Set[PartitionKey]): Unit
- def maxNumFilesInTable(tableId: String): Long
Get the maximum number of files among all partitions inside table
tableId. - def merge(tableId: String, inputPartitionFiles: PartitionFilesMap): Unit
This method merges the
inputPartitionFilesof current committed transaction to the global cache of table partition stats.This method merges the
inputPartitionFilesof current committed transaction to the global cache of table partition stats. After merge is completed, tablePath will be moved to most recently used position. If the number of occupied partitions exceeds MAX_NUM_PARTITIONS, the least recently used tables will be evicted out.- tableId
The path of the table that contains
inputPartitionFiles.- inputPartitionFiles
The number of files, which are qualified for Auto Compaction, in each partition.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()