public class TargetEncoder extends water.Iced<TargetEncoder>
| Modifier and Type | Class and Description |
|---|---|
static class |
TargetEncoder.AddNoiseTask |
static class |
TargetEncoder.DataLeakageHandlingStrategy |
static class |
TargetEncoder.SubtractCurrentRowForLeaveOneOutTask |
| Modifier and Type | Field and Description |
|---|---|
static BlendingParams |
DEFAULT_BLENDING_PARAMS |
static java.lang.String |
DENOMINATOR_COL_NAME |
static java.lang.String |
ENCODED_COLUMN_POSTFIX |
static java.lang.String |
NUMERATOR_COL_NAME |
| Constructor and Description |
|---|
TargetEncoder(java.lang.String[] columnNamesToEncode) |
| Modifier and Type | Method and Description |
|---|---|
water.fvec.Frame |
applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
boolean withBlendedAvg,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed) |
water.fvec.Frame |
applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
boolean withBlendedAvg,
double noiseLevel,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed) |
water.fvec.Frame |
applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumn,
boolean withBlendedAvg,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed) |
water.fvec.Frame |
applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> columnToEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumnName,
boolean withBlendedAvg,
double noiseLevel,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed) |
water.fvec.Frame |
applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> columnToEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumnName,
boolean useBlending,
double noiseLevel,
long seed,
water.Key<water.fvec.Frame> encodedFrameKey,
BlendingParams blendingParams)
Core method for applying pre-calculated encodings to the dataset.
|
water.fvec.Frame |
applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumn,
boolean withBlendedAvg,
long seed,
boolean imputeNAsWithNewCategory,
water.Key<water.fvec.Frame> encodedColumnName,
BlendingParams blendingParams) |
static water.fvec.Frame |
groupingIgnoringFoldColumn(java.lang.String foldColumnName,
water.fvec.Frame targetEncodingMap,
java.lang.String teColumnName) |
water.util.IcedHashMapGeneric<java.lang.String,water.fvec.Frame> |
prepareEncodingMap(water.fvec.Frame data,
java.lang.String targetColumnName,
java.lang.String foldColumnName) |
water.util.IcedHashMapGeneric<java.lang.String,water.fvec.Frame> |
prepareEncodingMap(water.fvec.Frame data,
java.lang.String targetColumnName,
java.lang.String foldColumnName,
boolean imputeNAsWithNewCategory) |
public static final java.lang.String ENCODED_COLUMN_POSTFIX
public static final BlendingParams DEFAULT_BLENDING_PARAMS
public static java.lang.String NUMERATOR_COL_NAME
public static java.lang.String DENOMINATOR_COL_NAME
public TargetEncoder(java.lang.String[] columnNamesToEncode)
columnNamesToEncode - names of columns to apply target encoding topublic water.util.IcedHashMapGeneric<java.lang.String,water.fvec.Frame> prepareEncodingMap(water.fvec.Frame data,
java.lang.String targetColumnName,
java.lang.String foldColumnName,
boolean imputeNAsWithNewCategory)
targetColumnName - name of the target columnfoldColumnName - name of the column that contains fold number the row is belong toimputeNAsWithNewCategory - set to `true` to impute NAs with new category. // TODO probably we need to always set it to true bc we do not support null values on the right side of merge operation.public water.util.IcedHashMapGeneric<java.lang.String,water.fvec.Frame> prepareEncodingMap(water.fvec.Frame data,
java.lang.String targetColumnName,
java.lang.String foldColumnName)
public water.fvec.Frame applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> columnToEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumnName,
boolean withBlendedAvg,
double noiseLevel,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed)
public water.fvec.Frame applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> columnToEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumnName,
boolean useBlending,
double noiseLevel,
long seed,
water.Key<water.fvec.Frame> encodedFrameKey,
BlendingParams blendingParams)
data - dataset that will be used as a base for creation of encodings .targetColumnName - name of the column with respect to which we were computing encodings.columnToEncodingMap - map of the prepared encodings with the keys being the names of the columns.dataLeakageHandlingStrategy - see TargetEncoding.DataLeakageHandlingStrategy //TODO use common interface for stronger type safety.foldColumnName - column's name that contains fold number the row is belong to.useBlending - whether to apply blending or not.noiseLevel - amount of noise to add to the final encodings.seed - we might want to specify particular values for reproducibility in tests.public static water.fvec.Frame groupingIgnoringFoldColumn(java.lang.String foldColumnName,
water.fvec.Frame targetEncodingMap,
java.lang.String teColumnName)
public water.fvec.Frame applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumn,
boolean withBlendedAvg,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed)
public water.fvec.Frame applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
java.lang.String foldColumn,
boolean withBlendedAvg,
long seed,
boolean imputeNAsWithNewCategory,
water.Key<water.fvec.Frame> encodedColumnName,
BlendingParams blendingParams)
public water.fvec.Frame applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
boolean withBlendedAvg,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed)
public water.fvec.Frame applyTargetEncoding(water.fvec.Frame data,
java.lang.String targetColumnName,
java.util.Map<java.lang.String,water.fvec.Frame> targetEncodingMap,
TargetEncoder.DataLeakageHandlingStrategy dataLeakageHandlingStrategy,
boolean withBlendedAvg,
double noiseLevel,
boolean imputeNAsWithNewCategory,
BlendingParams blendingParams,
long seed)