Object

ai.chronon.aggregator.row

StatsGenerator

Related Doc: package row

Permalink

object StatsGenerator

Module managing FeatureStats Schema, Aggregations to be used by type and aggregator construction.

Stats Aggregation has an offline/ batch component and an online component. The metrics defined for stats depend on the schema of the join. The dataTypes and column names. For the online side, we obtain this information from the JoinCodec/valueSchema For the offline side, we obtain this information directly from the outputTable. To keep the schemas consistent we sort the metrics in the schema by name. (one column can have multiple metrics).

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StatsGenerator
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class MetricTransform(name: String, expression: InputTransform, operation: Operation, suffix: String = "", argMap: Map[String, String] = null) extends Product with Serializable

    Permalink

    MetricTransform represents a single statistic built on top of an input column.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. object InputTransform extends Enumeration

    Permalink

    InputTransform acts as a signal of how to process the metric.

    InputTransform acts as a signal of how to process the metric.

    IsNull: Check if the input is null.

    Raw: Operate in the input column.

    One: lit(true) in spark. Used for row counts leveraged to obtain null rate values.

  5. def PSIKllSketch(reference: AnyRef, comparison: AnyRef, bins: Int = 128, eps: Double = 0.000001): AnyRef

    Permalink

    PSI is a measure of the difference between two probability distributions.

    PSI is a measure of the difference between two probability distributions. However, it's not defined for cases where a bin can have zero elements in either distribution (meant for continuous measures). In order to support PSI for discrete measures we add a small eps value to perturb the distribution in bins.

    Existing rules of thumb are: PSI < 0.10 means "little shift", .10<PSI<.25 means "moderate shift", and PSI>0.25 means "significant shift, action required" https://scholarworks.wmich.edu/dissertations/3208

  6. def SeriesFinalizer(key: String, value: AnyRef): AnyRef

    Permalink

    Post processing for finalized values or IRs when generating a time series of stats.

    Post processing for finalized values or IRs when generating a time series of stats. In the case of percentiles for examples we reduce to 5 values in order to generate candlesticks.

  7. def anyTransforms(column: String): Seq[MetricTransform]

    Permalink

    Stats applied to any column

  8. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  9. def buildAggPart(m: MetricTransform): AggregationPart

    Permalink
  10. def buildAggregator(metrics: Seq[MetricTransform], selectedSchema: StructType): RowAggregator

    Permalink

    Build RowAggregator to use for computing stats on a dataframe based on metrics

  11. def buildMetrics(fields: Seq[(String, DataType)]): Seq[MetricTransform]

    Permalink

    For the schema of the data define metrics to be aggregated

  12. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. val finalizedPercentilesMerged: Array[Double]

    Permalink
  17. val finalizedPercentilesSeries: Array[Double]

    Permalink
  18. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  19. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  20. val ignoreColumns: Seq[String]

    Permalink
  21. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  22. def lInfKllSketch(sketch1: AnyRef, sketch2: AnyRef, bins: Int = 128): AnyRef

    Permalink
  23. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  24. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. val nullRateSuffix: String

    Permalink
  27. val nullSuffix: String

    Permalink
  28. def numericTransforms(column: String): Seq[MetricTransform]

    Permalink

    Stats applied to numeric columns

  29. def regularize(doubles: Array[Double], eps: Double): Array[Double]

    Permalink

    Given a PMF add and substract small values to keep a valid probability distribution without zeros

  30. def statsInputSchema(valueSchema: StructType): StructType

    Permalink

    Input schema is the data required to update partial aggregations / stats.

    Input schema is the data required to update partial aggregations / stats.

    Given a valueSchema and a metric transform list, defines the schema expected by the Stats aggregator (online and offline)

  31. def statsIrSchema(valueSchema: StructType): StructType

    Permalink

    A valueSchema (for join) and Metric list define uniquely the IRSchema to be used for the statistics.

    A valueSchema (for join) and Metric list define uniquely the IRSchema to be used for the statistics. In order to support custom storage for statistic percentiles this method would need to be modified. IR Schemas are used to decode streaming partial aggregations as well as KvStore partial stats.

  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  33. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  34. val totalColumn: String

    Permalink
  35. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped