case class RegressionSplitter(randomizePivotLocation: Boolean = false) extends Splitter[Double] with Product with Serializable
Find the best split for regression problems.
The best split is the one that reduces the total weighted variance: totalVariance = N_left * \sigma_left2 + N_right * \sigma_right2 which, in scala-ish, would be: totalVariance = leftWeight * (leftSquareSum /leftWeight - (leftSum / leftWeight )2) + rightWeight * (rightSquareSum/rightWeight - (rightSum / rightWeight)2) Because we are comparing them, we can subtract off leftSquareSum + rightSquareSum, which yields the following simple expression after some simplification: totalVariance = -leftSum * leftSum / leftWeight - Math.pow(totalSum - leftSum, 2) / (totalWeight - leftWeight) which depends only on updates to leftSum and leftWeight (since totalSum and totalWeight are constant).
Created by maxhutch on 11/29/16.
- Alphabetic
- By Inheritance
- RegressionSplitter
- Serializable
- Serializable
- Product
- Equals
- Splitter
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new RegressionSplitter(randomizePivotLocation: Boolean = false)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getBestCategoricalSplit(data: Seq[(Vector[AnyVal], Double, Double)], calculator: VarianceCalculator, index: Int, minCount: Int): (CategoricalSplit, Double)
Get find the best categorical splitter.
Get find the best categorical splitter.
- data
to split
- index
of the feature to split on
- returns
the best split of this feature
-
def
getBestRealSplit(data: Seq[(Vector[AnyVal], Double, Double)], calculator: VarianceCalculator, index: Int, minCount: Int, randomizePivotLocation: Boolean = false): (RealSplit, Double)
Find the best split on a continuous variable.
Find the best split on a continuous variable.
If randomizePivotLocation is true, the split pivots are drawn from a uniform random distribution between the two data points. Each such pivot results in the same data split, but randomization can improve generalizability, particularly as part of an ensemble (i.e. random forests).
- data
to split
- index
of the feature to split on
- minCount
minimum number of data points to allow in each of the resulting splits
- randomizePivotLocation
whether generate splits randomly between the data points (default: false)
- returns
the best split of this feature
-
def
getBestSplit(data: Seq[(Vector[AnyVal], Double, Double)], numFeatures: Int, minInstances: Int): (Split, Double)
Get the best split, considering numFeature random features (w/o replacement)
Get the best split, considering numFeature random features (w/o replacement)
- data
to split
- numFeatures
to consider, randomly
- returns
a split object that optimally divides data
- Definition Classes
- RegressionSplitter → Splitter
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val randomizePivotLocation: Boolean
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )