public class CoercionUtils
extends java.lang.Object
| Modifier and Type | Method and Description |
|---|---|
static java.util.Map<java.lang.String,java.lang.Float> |
coerceToVector(java.lang.Object item)
Coerce item to term vector map, try to infer the feature type according to its type
Basic rules:
1.
|
static java.util.Map<java.lang.String,java.lang.Float> |
coerceToVector(java.lang.Object item,
FeatureTypes featureType)
Coerce an item to term vector format according to the provided
FeatureTypes. |
static FeatureTypes |
getCoercedFeatureType(java.lang.Object item)
Get the feature type that the input item would be coerced to
|
static boolean |
isBoolean(FeatureValue featureValue)
Returns true if the input
FeatureValue is a boolean feature and false otherwise
Boolean features when represented as a term vector has the form: ""=1.0f (true) and empty map (false) |
static boolean |
isCategorical(FeatureValue featureValue)
Returns true the input
FeatureValue is a categorical feature and false otherwise
Categorical features when represented as a term vector has the form: {"term"=1.0f} |
static boolean |
isNumeric(FeatureValue featureValue)
Returns true if the input
FeatureValue is a numeric feature and false otherwise
Numeric features when represented as a term vector have the form: {""=3.0f} |
static float |
safeToFloat(java.lang.Object item)
Safely convert an input object to its float representation.
|
static java.lang.String |
safeToString(java.lang.Object item)
Safely convert an input object into its string representation.
|
public static java.util.Map<java.lang.String,java.lang.Float> coerceToVector(java.lang.Object item,
FeatureTypes featureType)
FeatureTypes.
General rule for the dimension (terms) and value in the resulting term-vector:
Character (as is), CharSequence (as is) and Number (whole numbers only)NumbersFeatureTypes are as follows:
FeatureTypes.BOOLEAN accepts Boolean as input and are interpreted as scalar feature value
with the following encoding:
true => { "": 1.0f }
false => { } (empty map)
FeatureTypes.NUMERIC accepts Number as input and it is interpreted as scalar value without any
names/dimensions. The encoding produces vectors having "unit" dimension which we represent using the empty-string
with the float value.
0.12345f => { "": 0.12345f }
100.01d => { "": 100.01f }
BigDecimal.Ten => {"", 10f}
{@link FeatureTypes#DENSE_VECTOR} accepts {@link Listjava.lang.RuntimeException - when feature type doesn't match its expected data format, will throw exceptionpublic static java.util.Map<java.lang.String,java.lang.Float> coerceToVector(java.lang.Object item)
FeatureTypes.NUMERIC
2. Treat single string as categorical FeatureTypes.CATEGORICAL
3. Treat vector of numbers (int, float, double, etc) as FeatureTypes.DENSE_VECTOR
4. Treat a collection of strings as FeatureTypes.CATEGORICAL_SET
5. Treat map or list of maps as FeatureTypes.TERM_VECTOR
6. Treat FeatureValue as FeatureTypes.TERM_VECTOR
The function may be used to handle default value from configuration json file, and handle field values
extracted by MVEL expressionpublic static FeatureTypes getCoercedFeatureType(java.lang.Object item)
item - input item to coercepublic static boolean isNumeric(FeatureValue featureValue)
FeatureValue is a numeric feature and false otherwise
Numeric features when represented as a term vector have the form: {""=3.0f}public static boolean isBoolean(FeatureValue featureValue)
FeatureValue is a boolean feature and false otherwise
Boolean features when represented as a term vector has the form: ""=1.0f (true) and empty map (false)public static boolean isCategorical(FeatureValue featureValue)
FeatureValue is a categorical feature and false otherwise
Categorical features when represented as a term vector has the form: {"term"=1.0f}public static java.lang.String safeToString(java.lang.Object item)
CharSequence and Character are simply converted to StringNumber converted to String from its longValue()java.lang.RuntimeException - if the input Number is not a whole number within some precisionpublic static float safeToFloat(java.lang.Object item)