public abstract class VectorExpression extends Object implements Serializable
A vector expression is a vectorized execution tree that evaluates the same result as a (row-mode) ExprNodeDesc tree describes.
A vector expression has 0, 1, or more parameters and an optional output column. These are normally passed to the vector expression object' constructor. A few special case classes accept extra parameters via set* method.
A ExprNodeColumnDesc vectorizes to the IdentityExpression class where the input column number parameter is the same as the output column number.
A ExprNodeGenericFuncDesc's generic function can vectorize to many different vectorized objects depending on the parameter expression kinds (column, constant, etc) and data types. Each vectorized class implements the getDecription which indicates the particular expression kind and data type specialization that class is designed for. The Description is used by the VectorizationContext class in matching the right vectorized class.
The constructor parameters need to be in the same order as the generic function because the VectorizationContext class automates parameter generation and object construction.
Type information is remembered for the input parameters and the output type.
A vector expression has optional children vector expressions when 1 or more parameters need to be calculated into vector scratch columns. Columns and constants do not need children expressions.
HOW TO to extend VectorExpression (some basic steps and hints): 1. Create a subclass, and write a proper getDescriptor() (column/scalar?, number for args?, etc.) 2. Define an explicit parameterless constructor 3. Define a proper parameterized constructor (according to descriptor) 4. In case of UDF, add non-vectorized UDF class to Vectorizer.supported*UDFs 5. Add the new vectorized expression class to VectorizedExpressions annotation of the original UDF 6. If you subclass an expression, do the same steps (2,3,5) for subclasses as well (ctors) 7. If your base expression class is abstract, don't add it to VectorizedExpressions annotation
| Modifier and Type | Field and Description |
|---|---|
protected VectorExpression[] |
childExpressions
Child expressions for parameters -- but only those that need to be computed.
|
int[] |
inputColumnNum
Input column numbers of the vector expression, which should be reused by vector expressions.
|
protected DataTypePhysicalVariation[] |
inputDataTypePhysicalVariations |
protected TypeInfo[] |
inputTypeInfos
ALL input parameter type information is here including those for (non-computed) columns and
scalar values.
|
protected org.slf4j.Logger |
LOG |
int |
outputColumnNum
Output column number and type information of the vector expression.
|
protected DataTypePhysicalVariation |
outputDataTypePhysicalVariation |
protected TypeInfo |
outputTypeInfo |
| Constructor and Description |
|---|
VectorExpression() |
VectorExpression(int[] inputColumnNum,
int outputColumnNum)
Convenience method for expressions that uses arbitrary number of input columns in an array.
|
VectorExpression(int inputColumnNum,
int outputColumnNum)
Constructor for 1 input column and 1 output column.
|
VectorExpression(int inputColumnNum,
int inputColumnNum2,
int outputColumnNum)
Constructor for 2 input columns and 1 output column.
|
VectorExpression(int inputColumnNum,
int inputColumnNum2,
int inputColumnNum3,
int outputColumnNum)
Constructor for 3 input columns and 1 output column.
|
| Modifier and Type | Method and Description |
|---|---|
static String |
displayArrayOfUtf8ByteArrays(byte[][] arrayOfByteArrays) |
static String |
displayUtf8Bytes(byte[] bytes) |
static void |
doTransientInit(VectorExpression[] vecExprs,
org.apache.hadoop.conf.Configuration conf) |
static void |
doTransientInit(VectorExpression vecExpr,
org.apache.hadoop.conf.Configuration conf) |
abstract void |
evaluate(VectorizedRowBatch batch)
This is the primary method to implement expression logic.
|
protected void |
evaluateChildren(VectorizedRowBatch vrg)
Evaluate the child expressions on the given input batch.
|
VectorExpression[] |
getChildExpressions() |
protected Collection<VectorExpression> |
getChildExpressionsForTransientInit() |
protected String |
getColumnParamString(int typeNum,
int columnNum) |
abstract VectorExpressionDescriptor.Descriptor |
getDescriptor() |
protected String |
getDoubleValueParamString(int typeNum,
double value) |
DataTypePhysicalVariation[] |
getInputDataTypePhysicalVariations() |
TypeInfo[] |
getInputTypeInfos() |
protected String |
getLongValueParamString(int typeNum,
long value) |
int |
getOutputColumnNum()
Returns the index of the output column in the array
of column vectors.
|
ColumnVector.Type |
getOutputColumnVectorType() |
DataTypePhysicalVariation |
getOutputDataTypePhysicalVariation() |
TypeInfo |
getOutputTypeInfo()
Returns type of the output column.
|
protected String |
getParamTypeString(int typeNum) |
static String |
getTypeName(TypeInfo typeInfo,
DataTypePhysicalVariation dataTypePhysicalVariation) |
void |
init(org.apache.hadoop.conf.Configuration conf) |
void |
setChildExpressions(VectorExpression[] childExpressions)
Initialize the child expressions.
|
void |
setInputDataTypePhysicalVariations(DataTypePhysicalVariation... inputDataTypePhysicalVariations) |
void |
setInputTypeInfos(TypeInfo... inputTypeInfos) |
void |
setOutputDataTypePhysicalVariation(DataTypePhysicalVariation outputDataTypePhysicalVariation)
Set data type read variation.
|
void |
setOutputTypeInfo(TypeInfo outputTypeInfo)
Set type of the output column.
|
boolean |
shouldConvertDecimal64ToDecimal()
By default vector expressions do not handle decimal64 types and should be
converted into Decimal types if its output cannot handle Decimal64.
|
boolean |
supportsCheckedExecution()
A vector expression which implements a checked execution to account for overflow handling
should override this method and return true.
|
String |
toString() |
void |
transientInit(org.apache.hadoop.conf.Configuration conf) |
abstract String |
vectorExpressionParameters() |
protected final transient org.slf4j.Logger LOG
protected VectorExpression[] childExpressions
NOTE: Columns and constants are not included in the children. That is: column numbers and scalar values are passed via the constructor and remembered by the individual vector expression classes. They are not represented in the children.
protected TypeInfo[] inputTypeInfos
The vectorExpressionParameters() method is used to get the displayable string for the parameters used by EXPLAIN, logging, etc.
protected DataTypePhysicalVariation[] inputDataTypePhysicalVariations
public int outputColumnNum
protected TypeInfo outputTypeInfo
protected DataTypePhysicalVariation outputDataTypePhysicalVariation
public int[] inputColumnNum
public VectorExpression()
public VectorExpression(int inputColumnNum,
int outputColumnNum)
public VectorExpression(int inputColumnNum,
int inputColumnNum2,
int outputColumnNum)
public VectorExpression(int inputColumnNum,
int inputColumnNum2,
int inputColumnNum3,
int outputColumnNum)
public VectorExpression(int[] inputColumnNum,
int outputColumnNum)
public void setChildExpressions(VectorExpression[] childExpressions)
public VectorExpression[] getChildExpressions()
protected Collection<VectorExpression> getChildExpressionsForTransientInit()
public void setInputTypeInfos(TypeInfo... inputTypeInfos)
public TypeInfo[] getInputTypeInfos()
public void setInputDataTypePhysicalVariations(DataTypePhysicalVariation... inputDataTypePhysicalVariations)
public DataTypePhysicalVariation[] getInputDataTypePhysicalVariations()
public abstract String vectorExpressionParameters()
public void transientInit(org.apache.hadoop.conf.Configuration conf)
throws HiveException
HiveExceptionpublic static void doTransientInit(VectorExpression vecExpr, org.apache.hadoop.conf.Configuration conf) throws HiveException
HiveExceptionpublic static void doTransientInit(VectorExpression[] vecExprs, org.apache.hadoop.conf.Configuration conf) throws HiveException
HiveExceptionpublic int getOutputColumnNum()
public TypeInfo getOutputTypeInfo()
public void setOutputTypeInfo(TypeInfo outputTypeInfo)
public void setOutputDataTypePhysicalVariation(DataTypePhysicalVariation outputDataTypePhysicalVariation)
public DataTypePhysicalVariation getOutputDataTypePhysicalVariation()
public ColumnVector.Type getOutputColumnVectorType() throws HiveException
HiveExceptionpublic abstract void evaluate(VectorizedRowBatch batch) throws HiveException
HiveExceptionpublic void init(org.apache.hadoop.conf.Configuration conf)
public abstract VectorExpressionDescriptor.Descriptor getDescriptor()
protected final void evaluateChildren(VectorizedRowBatch vrg) throws HiveException
vrg - VectorizedRowBatchHiveExceptionprotected String getColumnParamString(int typeNum, int columnNum)
protected String getLongValueParamString(int typeNum, long value)
protected String getDoubleValueParamString(int typeNum, double value)
protected String getParamTypeString(int typeNum)
public static String getTypeName(TypeInfo typeInfo, DataTypePhysicalVariation dataTypePhysicalVariation)
public boolean supportsCheckedExecution()
public static String displayUtf8Bytes(byte[] bytes)
public static String displayArrayOfUtf8ByteArrays(byte[][] arrayOfByteArrays)
public boolean shouldConvertDecimal64ToDecimal()
Copyright © 2024 The Apache Software Foundation. All rights reserved.