org.apache.spark.sql.delta.stats
AutoCompactPartitionStats
Companion object AutoCompactPartitionStats
class AutoCompactPartitionStats extends AnyRef
This singleton object collect the table partition statistic for each commit that creates
AddFile or RemoveFile objects.
To control the memory usage, there are maxNumTablePartitions per table and 'maxNumPartitions'
partition entries across all tables.
Note:
- Since the partition of each table is limited, if this limitation is reached, the least recently used table partitions will be evicted. 2. If all 'maxNumPartitions' are occupied, the partition stats of least recently used tables will be evicted until the used partitions fall back below to 'maxNumPartitions'. 3. The un-partitioned tables are treated as tables with single partition.
- Alphabetic
- By Inheritance
- AutoCompactPartitionStats
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
AutoCompactPartitionStats(maxNumTablePartitions: Int, maxNumPartitions: Int)
- maxNumTablePartitions
The hash space of partition key to reduce memory usage per table.
- maxNumPartitions
The maximum number of partition that can be occupied.
Type Members
- type PartitionFilesMap = LinkedHashMap[Int, Long]
- type PartitionKey = Map[String, String]
- type PartitionKeySet = Set[Map[String, String]]
-
class
PartitionStat extends AnyRef
This class to store the states of one table partition.
This class to store the states of one table partition. These state includes: -- the number of small files, -- the thread that assigned to compact this partition, and -- whether the partition was compacted.
Note: Since this class keeps tracking of the statistics of the table partition and the state of the auto compaction thread that works on the table partition, any method that accesses any attribute of this class needs to be protected by synchronized context.
-
type
TablePartitionStats = LinkedHashMap[Int, PartitionStat]
This hashtable is used to store all table partition states of a table, the key is the hashcode of the partition, the value is PartitionStat object.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
collectPartitionStats(collector: AutoCompactPartitionStatsCollector, tableId: String, actions: Iterator[Action]): Unit
Collect the number of files, which are less than minFileSize, added to or removed from each partition from
actions. -
def
createStatsCollector(minFileSize: Long, errorReporter: (Throwable) ⇒ Unit): AutoCompactPartitionStatsCollector
Helper class used to keep state regarding tracking auto-compaction stats of AddFile and RemoveFile actions in a single run that are greater than a passed-in minimum file size.
Helper class used to keep state regarding tracking auto-compaction stats of AddFile and RemoveFile actions in a single run that are greater than a passed-in minimum file size. If the collector runs into any non-fatal errors, it will invoke the error reporter on the error and then skip further execution.
- minFileSize
Minimum file size for files we track auto-compact stats
- errorReporter
Function that reports the first error, if any
- returns
A collector object that tracks the Add/Remove file actions of the current commit.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
filterPartitionsWithSmallFiles(tableId: String, targetPartitions: Set[PartitionKey], minNumFiles: Long): Set[PartitionKey]
- returns
Filter partitions from targetPartitions that have not been auto-compacted or that have enough small files.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def markPartitionsAsCompacted(tableId: String, compactedPartitions: Set[PartitionKey]): Unit
-
def
maxNumFilesInTable(tableId: String): Long
Get the maximum number of files among all partitions inside table
tableId. -
def
merge(tableId: String, inputPartitionFiles: PartitionFilesMap): Unit
This method merges the
inputPartitionFilesof current committed transaction to the global cache of table partition stats.This method merges the
inputPartitionFilesof current committed transaction to the global cache of table partition stats. After merge is completed, tablePath will be moved to most recently used position. If the number of occupied partitions exceeds MAX_NUM_PARTITIONS, the least recently used tables will be evicted out.- tableId
The path of the table that contains
inputPartitionFiles.- inputPartitionFiles
The number of files, which are qualified for Auto Compaction, in each partition.
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()