class NumericHistogram extends AnyRef
A generic, re-usable histogram class that supports partial aggregations. The algorithm is a heuristic adapted from the following paper: Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872. Although there are no approximation guarantees, it appears to work well with adequate data and a large (e.g., 20-80) number of histogram bins.
Adapted from Hive's NumericHistogram. Can refer to https://github.com/apache/hive/blob/master/ql/src/ java/org/apache/hadoop/hive/ql/udf/generic/NumericHistogram.java
Differences:
- Declaring Coord and it's variables as public types for easy access in the HistogramNumeric class. 2. Add method getNumBins() for serialize NumericHistogram in NumericHistogramSerializer. 3. Add method addBin() for deserialize NumericHistogram in NumericHistogramSerializer. 4. In Hive's code, the method pass a serialized histogram, in Spark, this method pass a deserialized histogram. Here we change the code about merge bins.
- Since
3.3.0
- Alphabetic
- By Inheritance
- NumericHistogram
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new NumericHistogram()
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
add(v: Double): Unit
Adds a new data point to the histogram approximation.
Adds a new data point to the histogram approximation. Make sure you have called either allocate() or merge() first. This method implements Algorithm #1 from Ben-Haim and Tom-Tov, "A Streaming Parallel Decision Tree Algorithm", JMLR 2010.
- v
The data point to add to the histogram approximation.
-
def
addBin(x: Double, y: Double, b: Int): Unit
Set a particular histogram bin with index.
-
def
allocate(num_bins: Int): Unit
Sets the number of histogram bins to use for approximating data.
Sets the number of histogram bins to use for approximating data.
- num_bins
Number of non-uniform-width histogram bins to use
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getBin(b: Int): Coord
Returns a particular histogram bin.
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getNumBins(): Int
Returns the number of bins.
-
def
getUsedBins(): Int
Returns the number of bins currently being used by the histogram.
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isReady(): Boolean
Returns true if this histogram object has been initialized by calling merge() or allocate().
-
def
merge(other: NumericHistogram): Unit
Takes a histogram and merges it with the current histogram object.
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
reset(): Unit
Resets a histogram object to its initial state.
Resets a histogram object to its initial state. allocate() or merge() must be called again before use.
-
def
setUsedBins(nusedBins: Int): Unit
Set the number of bins currently being used by the histogram.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()