abstract class RowBasedKeyValueBatch extends MemoryConsumer with Closeable
RowBasedKeyValueBatch stores key value pairs in contiguous memory region.
Each key or value is stored as a single UnsafeRow. Each record contains one key and one value
and some auxiliary data, which differs based on implementation:
i.e., FixedLengthRowBasedKeyValueBatch and VariableLengthRowBasedKeyValueBatch.
We use FixedLengthRowBasedKeyValueBatch if all fields in the key and the value are fixed-length
data types. Otherwise we use VariableLengthRowBasedKeyValueBatch.
RowBasedKeyValueBatch is backed by a single page / MemoryBlock (ranges from 1 to 64MB depending on the system configuration). If the page is full, the aggregate logic should fallback to a second level, larger hash map. We intentionally use the single-page design because it simplifies memory address encoding & decoding for each key-value pair. Because the maximum capacity for RowBasedKeyValueBatch is only 2^16, it is unlikely we need a second page anyway. Filling the page requires an average size for key value pairs to be larger than 1024 bytes.
- Alphabetic
- By Inheritance
- RowBasedKeyValueBatch
- Closeable
- AutoCloseable
- MemoryConsumer
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new RowBasedKeyValueBatch(keySchema: StructType, valueSchema: StructType, maxRows: Int, manager: TaskMemoryManager)
- Attributes
- protected[expressions]
Abstract Value Members
- abstract def appendRow(kbase: AnyRef, koff: Long, klen: Int, vbase: AnyRef, voff: Long, vlen: Int): UnsafeRow
Append a key value pair.
Append a key value pair. It copies data into the backing MemoryBlock. Returns an UnsafeRow pointing to the value if succeeds, otherwise returns null.
- abstract def getKeyRow(rowId: Int): UnsafeRow
Returns the key row in this batch at
rowId.Returns the key row in this batch at
rowId. Returned key row is reused across calls. - abstract def getValueFromKey(rowId: Int): UnsafeRow
Returns the value row by two steps: 1) looking up the key row with the same id (skipped if the key row is cached) 2) retrieve the value row by reusing the metadata from step 1) In most times, 1) is skipped because
getKeyRow(id)is often called beforegetValueRow(id).Returns the value row by two steps: 1) looking up the key row with the same id (skipped if the key row is cached) 2) retrieve the value row by reusing the metadata from step 1) In most times, 1) is skipped because
getKeyRow(id)is often called beforegetValueRow(id).- Attributes
- protected[expressions]
- abstract def rowIterator(): KVIterator[UnsafeRow, UnsafeRow]
Returns an iterator to go through all rows
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def acquireMemory(arg0: Long): Long
- Definition Classes
- MemoryConsumer
- def allocateArray(arg0: Long): LongArray
- Definition Classes
- MemoryConsumer
- def allocatePage(arg0: Long): MemoryBlock
- Attributes
- protected[memory]
- Definition Classes
- MemoryConsumer
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def close(): Unit
- Definition Classes
- RowBasedKeyValueBatch → Closeable → AutoCloseable
- Annotations
- @Override()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def freeArray(arg0: LongArray): Unit
- Definition Classes
- MemoryConsumer
- def freeMemory(arg0: Long): Unit
- Definition Classes
- MemoryConsumer
- def freePage(arg0: MemoryBlock): Unit
- Attributes
- protected[memory]
- Definition Classes
- MemoryConsumer
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getMode(): MemoryMode
- Definition Classes
- MemoryConsumer
- def getUsed(): Long
- Definition Classes
- MemoryConsumer
- final def getValueRow(rowId: Int): UnsafeRow
Returns the value row in this batch at
rowId.Returns the value row in this batch at
rowId. Returned value row is reused across calls. BecausegetValueRow(id)is always called aftergetKeyRow(id)with the same id, we usegetValueFromKey(id) to retrieve value row, which reuses metadata from the cached key. - def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def numRows(): Int
- final def spill(size: Long, trigger: MemoryConsumer): Long
Sometimes the TaskMemoryManager may call spill() on its associated MemoryConsumers to make space for new consumers.
Sometimes the TaskMemoryManager may call spill() on its associated MemoryConsumers to make space for new consumers. For RowBasedKeyValueBatch, we do not actually spill and return 0. We should not throw OutOfMemory exception here because other associated consumers might spill
- Definition Classes
- RowBasedKeyValueBatch → MemoryConsumer
- Annotations
- @Override()
- def spill(): Unit
- Definition Classes
- MemoryConsumer
- Annotations
- @throws(classOf[java.io.IOException])
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()