class VectorizedHashMapGenerator extends HashMapGenerator
This is a helper class to generate an append-only vectorized hash map that can act as a 'cache'
for extremely fast key-value lookups while evaluating aggregates (and fall back to the
BytesToBytesMap if a given key isn't found). This is 'codegened' in HashAggregate to speed
up aggregates w/ key.
It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the
key-value pairs. The index lookups in the array rely on linear probing (with a small number of
maximum tries) and use an inexpensive hash function which makes it really efficient for a
majority of lookups. However, using linear probing and an inexpensive hash function also makes it
less robust as compared to the BytesToBytesMap (especially for a large number of keys or even
for certain distribution of keys) and requires us to fall back on the latter for correctness. We
also use a secondary columnar batch that logically projects over the original columnar batch and
is equivalent to the BytesToBytesMap aggregate buffer.
NOTE: This vectorized hash map currently doesn't support nullable keys and falls back to the
BytesToBytesMap to store them.
- Alphabetic
- By Inheritance
- VectorizedHashMapGenerator
- HashMapGenerator
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new VectorizedHashMapGenerator(ctx: CodegenContext, aggregateExpressions: Seq[AggregateExpression], generatedClassName: String, groupingKeySchema: StructType, bufferSchema: StructType, bitMaxCapacity: Int)
Type Members
- case class Buffer(dataType: DataType, name: String) extends Product with Serializable
- Definition Classes
- HashMapGenerator
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- val buffVars: Seq[ExprCode]
- Definition Classes
- HashMapGenerator
- val bufferValues: Seq[Buffer]
- Definition Classes
- HashMapGenerator
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def genComputeHash(ctx: CodegenContext, input: String, dataType: DataType, result: String): String
- Attributes
- protected
- Definition Classes
- HashMapGenerator
- def generate(): String
- Definition Classes
- HashMapGenerator
- final def generateClose(): String
- Attributes
- protected
- Definition Classes
- HashMapGenerator
- def generateEquals(): String
Generates a method that returns true if the group-by keys exist at a given index in the associated org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.
Generates a method that returns true if the group-by keys exist at a given index in the associated org.apache.spark.sql.execution.vectorized.OnHeapColumnVector. For instance, if we have 2 long group-by keys, the generated function would be of the form:
private boolean equals(int idx, long agg_key, long agg_key1) { return vectors[0].getLong(buckets[idx]) == agg_key && vectors[1].getLong(buckets[idx]) == agg_key1; }
- Attributes
- protected
- Definition Classes
- VectorizedHashMapGenerator → HashMapGenerator
- def generateFindOrInsert(): String
Generates a method that returns a org.apache.spark.sql.execution.vectorized.MutableColumnarRow which keeps track of the aggregate value(s) for a given set of keys.
Generates a method that returns a org.apache.spark.sql.execution.vectorized.MutableColumnarRow which keeps track of the aggregate value(s) for a given set of keys. If the corresponding row doesn't exist, the generated method adds the corresponding row in the associated org.apache.spark.sql.execution.vectorized.OnHeapColumnVector. For instance, if we have 2 long group-by keys, the generated function would be of the form:
public MutableColumnarRow findOrInsert(long agg_key, long agg_key1) { long h = hash(agg_key, agg_key1); int step = 0; int idx = (int) h & (numBuckets - 1); while (step < maxSteps) { // Return bucket index if it's either an empty slot or already contains the key if (buckets[idx] == -1) { if (numRows < capacity) { vectors[0].putLong(numRows, agg_key); vectors[1].putLong(numRows, agg_key1); vectors[2].putLong(numRows, 0); buckets[idx] = numRows++; aggBufferRow.rowId = numRows; return aggBufferRow; } else { // No more space return null; } } else if (equals(idx, agg_key, agg_key1)) { aggBufferRow.rowId = buckets[idx]; return aggBufferRow; } idx = (idx + 1) & (numBuckets - 1); step++; } // Didn't find it return null; }- Attributes
- protected
- Definition Classes
- VectorizedHashMapGenerator → HashMapGenerator
- final def generateHashFunction(): String
Generates a method that computes a hash by currently xor-ing all individual group-by keys.
Generates a method that computes a hash by currently xor-ing all individual group-by keys. For instance, if we have 2 long group-by keys, the generated function would be of the form:
private long hash(long agg_key, long agg_key1) { return agg_key ^ agg_key1; }
- Attributes
- protected
- Definition Classes
- HashMapGenerator
- def generateRowIterator(): String
- Attributes
- protected
- Definition Classes
- VectorizedHashMapGenerator → HashMapGenerator
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val groupingKeySignature: String
- Definition Classes
- HashMapGenerator
- val groupingKeys: Seq[Buffer]
- Definition Classes
- HashMapGenerator
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeAggregateHashMap(): String
- Attributes
- protected
- Definition Classes
- VectorizedHashMapGenerator → HashMapGenerator
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()