package expressions
- Alphabetic
- Public
- All
Type Members
- class HilbertStates extends AnyRef
-
case class
InterleaveBits(children: Seq[Expression]) extends Expression with ExpectsInputTypes with SQLConfHelper with CodegenFallback with Product with Serializable
Interleaves the bits of its input data in a round-robin fashion.
Interleaves the bits of its input data in a round-robin fashion.
If the input data is seen as a series of multidimensional points, this function computes the corresponding Z-values, in a way that's preserving data locality: input points that are close in the multidimensional space will be mapped to points that are close on the Z-order curve.
The returned value is a byte array where the size of the array is 4 * num of input columns.
- Note
Only supports input expressions of type Int for now.
- See also
https://en.wikipedia.org/wiki/Z-order_curve
-
case class
PartitionerExpr(child: Expression, partitioner: Partitioner) extends UnaryExpression with Product with Serializable
Thin wrapper around Partitioner instances that are used in Shuffle operations.
Thin wrapper around Partitioner instances that are used in Shuffle operations. TODO: If needed elsewhere, consider moving it into its own file.
-
case class
RangePartitionId(child: Expression, numPartitions: Int) extends UnaryExpression with Unevaluable with Product with Serializable
Unevaluable placeholder expression to be rewritten by the optimizer into PartitionerExpr
Unevaluable placeholder expression to be rewritten by the optimizer into PartitionerExpr
This is just a convenient way to introduce the former, without the need to manually construct the RangePartitioner beforehand, which requires an RDD to be sampled in order to determine range partition boundaries. The optimizer rule will take care of all that.
- See also
org.apache.spark.sql.delta.optimizer.RangeRepartitionIdRewrite
Value Members
-
object
HilbertIndex
The following code is based on this paper: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bfd6d94c98627756989b0147a68b7ab1f881a0d6 with optimizations around matrix manipulation taken from this one: https://pdfs.semanticscholar.org/4043/1c5c43a2121e1bc071fc035e90b8f4bb7164.pdf
The following code is based on this paper: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bfd6d94c98627756989b0147a68b7ab1f881a0d6 with optimizations around matrix manipulation taken from this one: https://pdfs.semanticscholar.org/4043/1c5c43a2121e1bc071fc035e90b8f4bb7164.pdf
At a high level you construct a GeneratorTable with the getStateGenerator method. That represents the information necessary to construct a state list for a given number of dimension, N. Once you have the generator table for your dimension you can construct a state list. You can then turn those state lists into compact state lists that store all the information in one large array of longs.
- object HilbertUtils
- object InterleaveBits extends Serializable
-
object
JoinedProjection
Helper class for generating a joined projection.
Helper class for generating a joined projection.
This class is used to instantiate a "Joined Row" - a wrapper that makes two rows appear to be a single concatenated row, by using nested access. It is primarily used during statistics collection to update a buffer of per-column aggregates (i.e. the left-hand side row) with stats from the latest row processed (i.e. the right-hand side row).
Implementation Note: If we instead stored
leftRowandrightRowwe would have to perform size checks onleftRowduring every access, which is slow.