public class CoverTree extends NearestNeighbourSearch implements TechnicalInformationHandler
@inproceedings{Beygelzimer2006,
address = {New York, NY, USA},
author = {Alina Beygelzimer and Sham Kakade and John Langford},
booktitle = {ICML'06: Proceedings of the 23rd international conference on Machine learning},
pages = {97-104},
publisher = {ACM Press},
title = {Cover trees for nearest neighbor},
year = {2006},
location = {Pittsburgh, Pennsylvania},
HTTP = {http://hunch.net/\~jl/projects/cover_tree/cover_tree.html}
}
Valid options are:
-B <value> Set base of the expansion constant (default = 1.3).
| Modifier and Type | Class and Description |
|---|---|
class |
CoverTree.CoverTreeNode
class representing a node of the cover tree.
|
protected class |
CoverTree.MyHeap
A class for a heap to store the nearest k neighbours to an instance.
|
protected class |
CoverTree.MyHeapElement
A class for storing data about a neighboring instance.
|
NearestNeighbourSearch.NeighborList, NearestNeighbourSearch.NeighborNode| Modifier and Type | Field and Description |
|---|---|
protected double |
il2
if we have base 2 then this can be viewed as 1/ln(2), which can be used
later on to do il2*ln(d) instead of ln(d)/ln(2), to get log2(d), in
get_scale method.
|
protected double |
m_Base
The base of our expansion constant.
|
protected double[] |
m_DistanceList
Array holding the distances of the nearest neighbours.
|
protected EuclideanDistance |
m_EuclideanDistance
The euclidean distance function to use.
|
protected int |
m_MaxDepth
Number of nodes in the tree.
|
protected int |
m_NumLeaves
Number of nodes in the tree.
|
protected int |
m_NumNodes
Number of nodes in the tree.
|
protected CoverTree.CoverTreeNode |
m_Root
The root node.
|
protected TreePerformanceStats |
m_TreeStats
Tree Stats variables.
|
m_DistanceFunction, m_Instances, m_kNN, m_MeasurePerformance, m_Stats| Constructor and Description |
|---|
CoverTree()
default constructor.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addInstanceInfo(Instance ins)
Adds the given instance info.
|
java.lang.String |
baseTipText()
Returns the tip text for this property.
|
protected CoverTree.CoverTreeNode |
batch_insert(java.lang.Integer p,
int max_scale,
int top_scale,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> consumed_set)
Creates a cover tree recursively using batch insert method.
|
protected void |
batch_nearest_neighbor(int k,
CoverTree.CoverTreeNode tree_root,
CoverTree.CoverTreeNode query_root,
Stack<NearestNeighbourSearch.NeighborList> results)
Performs k-NN search for a batch of queries provided in the form of a cover
tree.
|
protected void |
brute_nearest(int k,
CoverTree.CoverTreeNode query,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
CoverTree.MyHeap upper_k,
Stack<NearestNeighbourSearch.NeighborList> results)
Does a brute force NN search on the nodes in the given zero set.
|
protected void |
buildCoverTree(Instances insts)
Builds the tree on the given set of instances.
|
protected void |
checkMissing(Instances instances)
Checks if there is any instance with missing values.
|
protected double |
compare(int p1,
int p2,
Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
Returns the difference of two given nodes distance to the query.
|
protected void |
copy_cover_sets(CoverTree.CoverTreeNode query_chi,
CoverTree.MyHeap new_upper_k,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> new_cover_sets,
int current_scale,
int max_scale)
Copies the contents of one set of cover sets to the other.
|
protected void |
copy_zero_set(CoverTree.CoverTreeNode query_chi,
CoverTree.MyHeap new_upper_k,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
Stack<weka.core.neighboursearch.CoverTree.d_node> new_zero_set)
Copies the contents of one zero set to the other.
|
protected int |
descend(CoverTree.CoverTreeNode query,
CoverTree.MyHeap upper_k,
int current_scale,
int max_scale,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set)
This functions adds nodes for inspection at the next level during NN
search.
|
protected double |
dist_of_scale(int s)
Returns the distance/value of a given scale/level.
|
protected void |
dist_split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> new_point_set,
weka.core.neighboursearch.CoverTree.DistanceNode new_point,
int max_scale)
Moves all the points in point_set covered by (the ball of) new_point into
new_point_set, based on the given scale/level.
|
java.util.Enumeration<java.lang.String> |
enumerateMeasures()
Returns an enumeration of the additional measure names.
|
protected NearestNeighbourSearch.NeighborList |
findKNearest(Instance target,
int k)
Performs k-NN serach for a single given query/test Instance.
|
protected int |
get_scale(double d)
Finds the scale/level of a given value.
|
double |
getBase()
Returns the base in use for expansion constant.
|
protected Stack<weka.core.neighboursearch.CoverTree.d_node> |
getCoverSet(int idx,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets)
Returns a cover set for a given level/scale.
|
double[] |
getDistances()
Returns the distances of the (k)-NN(s) found earlier by
kNearestNeighbours()/nearestNeighbour().
|
double |
getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure.
|
java.lang.String[] |
getOptions()
Gets the current settings of KDtree.
|
java.lang.String |
getRevision()
Returns the revision string.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed
information about the technical background of this class, e.g., paper
reference or book this class is based on.
|
java.lang.String |
globalInfo()
Returns a string describing this nearest neighbour search algorithm.
|
protected void |
halfsort(Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
Half-sorts a cover set, so that nodes nearer to the query are at the front.
|
protected void |
internal_batch_nearest_neighbor(int k,
CoverTree.CoverTreeNode query_node,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
int current_scale,
int max_scale,
CoverTree.MyHeap upper_k,
Stack<NearestNeighbourSearch.NeighborList> results)
Performs a recursive k-NN search for a given batch of queries provided in
the form of a cover tree.
|
Instances |
kNearestNeighbours(Instance target,
int k)
Returns k-NNs of a given target instance, from among the previously
supplied training instances (supplied through setInstances method) P.S.:
May return more than k-NNs if more one instances have the same distance to
the target as the kth NN.
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] args)
Method for testing the class from command line.
|
protected double |
max_set(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> v)
Returns the max distance of the reference point p in current node to it's
children nodes.
|
double |
measureMaxDepth()
Returns the depth of the tree.
|
double |
measureNumLeaves()
Returns the number of leaves.
|
double |
measureTreeSize()
Returns the size of the tree.
|
Instance |
nearestNeighbour(Instance target)
Returns the NN instance of a given target instance, from among the
previously supplied training instances.
|
protected CoverTree.CoverTreeNode |
new_leaf(java.lang.Integer idx)
Creates a new leaf node for a given Instance/point p.
|
protected CoverTree.CoverTreeNode |
new_node(java.lang.Integer idx)
Creates a new internal node for a given Instance/point p.
|
protected static void |
print_space(int s)
Prints the specified number of spaces.
|
protected static void |
print(int depth,
CoverTree.CoverTreeNode top_node)
Prints a cover tree starting from the given node.
|
protected static void |
print(java.lang.Object o)
Prints an object to stdout.
|
protected static void |
print(java.lang.String s)
Prints a string to stdout.
|
protected static void |
println(java.lang.Object o)
Prints an object to stdout followed by newline.
|
protected static void |
println(java.lang.String s)
Prints a string to stdout followed by newline.
|
void |
setBase(double b)
Sets the base to use for expansion constant.
|
void |
setDistanceFunction(DistanceFunction df)
Sets the distance function to use for nearest neighbour search.
|
void |
setInstances(Instances instances)
Builds the Cover Tree on the given set of instances.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
protected void |
setter(CoverTree.MyHeap heap,
double upper_bound,
int k)
Initializes a heap with k values of the the given upper_bound.
|
protected boolean |
shell(double parent_query_dist,
double child_parent_dist,
double upper_bound)
Function to check if a child node can be inside a query ball, without
calculating the child node's distance to the query.
|
protected void |
split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> far_set,
int max_scale)
Splits a given point_set into near and far based on the given scale/level.
|
protected void |
SWAP(int a,
int b,
Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
Swap two nodes in a cover set.
|
protected void |
update(CoverTree.MyHeap upper_bound,
double new_bound)
Replaces the current top/max value in the heap with the new one.
|
void |
update(Instance ins)
Adds an instance to the cover tree.
|
combSort11, distanceFunctionTipText, getDistanceFunction, getInstances, getMeasurePerformance, getPerformanceStats, measurePerformanceTipText, partition, quickSort, setMeasurePerformanceprotected EuclideanDistance m_EuclideanDistance
protected CoverTree.CoverTreeNode m_Root
protected double[] m_DistanceList
protected int m_NumNodes
protected int m_NumLeaves
protected int m_MaxDepth
protected TreePerformanceStats m_TreeStats
protected double m_Base
protected double il2
public java.lang.String globalInfo()
globalInfo in class NearestNeighbourSearchpublic TechnicalInformation getTechnicalInformation()
getTechnicalInformation in interface TechnicalInformationHandlerpublic java.util.Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class NearestNeighbourSearchpublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
-B <value> Set base of the expansion constant (default = 1.3).
setOptions in interface OptionHandlersetOptions in class NearestNeighbourSearchoptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class NearestNeighbourSearchprotected double dist_of_scale(int s)
s - the level/scaleprotected int get_scale(double d)
d - the value whose scale/level is to be determined.protected CoverTree.CoverTreeNode new_node(java.lang.Integer idx)
idx - The index of the instance the node represents.protected CoverTree.CoverTreeNode new_leaf(java.lang.Integer idx)
idx - The index of the instance this leaf node represents.protected double max_set(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> v)
v - The stack of DistanceNode objects.protected void split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> far_set, int max_scale)
point_set - The supplied set from which all far points would be
removed.far_set - The set in which all far points having distance >
base^max_scale would be put into.max_scale - The given scale based on which the distances of points are
judged to be far or near.protected void dist_split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> new_point_set, weka.core.neighboursearch.CoverTree.DistanceNode new_point, int max_scale)
point_set - The supplied set of instances from which all points
covered by new_point will be removed.new_point_set - The set in which all points covered by new_point will
be put into.new_point - The given new point.max_scale - The scale based on which distances are judged (radius of
cover ball is calculated).protected CoverTree.CoverTreeNode batch_insert(java.lang.Integer p, int max_scale, int top_scale, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> consumed_set)
p - The index of the instance from which to create the first node. All
other points will be inserted beneath this node for p.max_scale - The current scale/level where the node is to be created
(Also determines the radius of the cover balls created at this
level).top_scale - The max scale in the whole tree.point_set - The set of unprocessed points from which child nodes need
to be created.consumed_set - The set of processed points from which child nodes have
already been created. This would be used to find the radius of the
cover ball of p.protected void buildCoverTree(Instances insts) throws java.lang.Exception
insts - The instances on which to build the cover tree.java.lang.Exception - If the supplied set of Instances is empty, or if there
are missing values.protected void setter(CoverTree.MyHeap heap, double upper_bound, int k) throws java.lang.Exception
heap - The heap to put values into.upper_bound - The value to put into heap (the value with which it
should be initialized).k - The number of times upper_bound should be put into heap for
initialization.java.lang.Exception - If there is some problem in initializing the heap (if k
> size of the heap).protected void update(CoverTree.MyHeap upper_bound, double new_bound) throws java.lang.Exception
upper_bound - The heap.new_bound - The new value that should replace the old top one.java.lang.Exception - if the new value is greater than the old value.protected Stack<weka.core.neighboursearch.CoverTree.d_node> getCoverSet(int idx, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets)
idx - The level/scale for which the cover set is required.cover_sets - The covers sets. Consists of stack of a stack of d_node
objects.protected void copy_zero_set(CoverTree.CoverTreeNode query_chi, CoverTree.MyHeap new_upper_k, Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set, Stack<weka.core.neighboursearch.CoverTree.d_node> new_zero_set) throws java.lang.Exception
query_chi - The child node of our query node that we are going to
inspect.new_upper_k - New heap that will store the distances of the k NNs for
query_chi.zero_set - The zero set of query_chi's parent that needs to be copied.new_zero_set - The new zero set of query_chi where old zero sets need
to be copied into.java.lang.Exception - If there is some problem.protected void copy_cover_sets(CoverTree.CoverTreeNode query_chi, CoverTree.MyHeap new_upper_k, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> new_cover_sets, int current_scale, int max_scale) throws java.lang.Exception
query_chi - The child node of our query node that we are going to
inspect.new_upper_k - New heap that will store the distances of the k NNs for
query_chi.cover_sets - The cover_sets of query_chi's parent, which need to be
copied to new_cover_sets.new_cover_sets - The new set of cover_sets that need to contain
contents of cover_sets.current_scale - The scale/level we are inspecting in our cover tree.max_scale - The maximum level so far possible in our search (this is
only updated as we descend and a deeper child is found inside the
query ball).java.lang.Exception - If there is problem.protected void SWAP(int a,
int b,
Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
a - The index first node.b - The index of second node.cover_set - The cover set in which the two nodes are.protected double compare(int p1,
int p2,
Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
p1 - The index of first node.p2 - The index of second node.cover_set - The cover set containing the two given nodes.protected void halfsort(Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
cover_set - The cover set to sort.protected boolean shell(double parent_query_dist,
double child_parent_dist,
double upper_bound)
parent_query_dist - The distance of parent to the querychild_parent_dist - The distance of child to the parent.upper_bound - The distance to the query of the best kth NN found so
far.protected int descend(CoverTree.CoverTreeNode query, CoverTree.MyHeap upper_k, int current_scale, int max_scale, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets, Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set) throws java.lang.Exception
query - The query (in shape of a cover tree node, as we are doing
batch searching).upper_k - Heap containing distances of best k-NNs found so far.current_scale - The current scale/level being looked at in the tree.max_scale - The max scale/level that has so far been looked at.cover_sets - The cover sets of tree nodes for each level of our trees
for.zero_set - The set containing leaf nodes.java.lang.Exception - If there is some problem (in updating the heap upper_k).protected void brute_nearest(int k,
CoverTree.CoverTreeNode query,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
CoverTree.MyHeap upper_k,
Stack<NearestNeighbourSearch.NeighborList> results)
throws java.lang.Exception
k - The k in kNN.query - The query.zero_set - The zero set on which the brute force NN search is
performed.upper_k - The heap storing distances of k-NNs found during the search.results - The returned k-NNs.java.lang.Exception - If there is somem problem.protected void internal_batch_nearest_neighbor(int k,
CoverTree.CoverTreeNode query_node,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
int current_scale,
int max_scale,
CoverTree.MyHeap upper_k,
Stack<NearestNeighbourSearch.NeighborList> results)
throws java.lang.Exception
k - The number of NNs to find.query_node - The node of the query tree to start the search from.cover_sets - The set of sets that contains internal nodes that were
found to be inside the query ball at previous scales/levels
(intially there would be just the root node at root level).zero_set - The set that'll contain the leaf nodes that are found to be
inside the query ball.current_scale - The level/scale to do the search from (this value
would be used to inspect the cover set in the provided set of
cover sets).max_scale - The max scale/level that has so far been inspected.upper_k - The heap containing distances of the best k-NNs found so far
(initialized to Double.POSITIVE_INFINITY).results - The list of returned k-NNs.java.lang.Exception - If there is some problem during the search.protected void batch_nearest_neighbor(int k,
CoverTree.CoverTreeNode tree_root,
CoverTree.CoverTreeNode query_root,
Stack<NearestNeighbourSearch.NeighborList> results)
throws java.lang.Exception
k - The number of k-NNs to find.tree_root - The root of the cover tree on which k-NN search is to be
performed.query_root - The root of the cover tree consisting of queries.results - The list of returned k-NNs.java.lang.Exception - If there is some problem during the search.protected NearestNeighbourSearch.NeighborList findKNearest(Instance target, int k) throws java.lang.Exception
target - The query/test instance.k - Number of k-NNs to find.java.lang.Exception - If there is some problem during the search for k-NNs.public Instances kNearestNeighbours(Instance target, int k) throws java.lang.Exception
kNearestNeighbours in class NearestNeighbourSearchtarget - The instance for which k-NNs are required.k - The number of k-NNs to find.java.lang.Exception - If there is some problem find the k-NNs.public Instance nearestNeighbour(Instance target) throws java.lang.Exception
nearestNeighbour in class NearestNeighbourSearchtarget - The instance for which NN is required.java.lang.Exception - If there is some problem finding the nearest neighbour.public double[] getDistances()
throws java.lang.Exception
getDistances in class NearestNeighbourSearchjava.lang.Exception - If the tree hasn't been built (by calling
setInstances()), or none of kNearestNeighbours() or
nearestNeighbour() has been called before.protected void checkMissing(Instances instances) throws java.lang.Exception
instances - the instances to checkjava.lang.Exception - if missing values are encounteredpublic void setInstances(Instances instances) throws java.lang.Exception
setInstances in class NearestNeighbourSearchinstances - The insts on which the Cover Tree is to be built.java.lang.Exception - If some error occurs while building the Cover Treepublic void update(Instance ins) throws java.lang.Exception
update in class NearestNeighbourSearchins - The instance to add.java.lang.Exception - Alway throws this, as current implementation doesn't
allow addition of instances after building.public void addInstanceInfo(Instance ins)
addInstanceInfo in class NearestNeighbourSearchins - The instance to add the information of. Usually this is the test
instance supplied to update the range of attributes in the
distance function.public void setDistanceFunction(DistanceFunction df) throws java.lang.Exception
setDistanceFunction in class NearestNeighbourSearchdf - the distance function to usejava.lang.Exception - if not EuclideanDistancepublic java.lang.String baseTipText()
public double getBase()
public void setBase(double b)
b - the new base;public double measureTreeSize()
public double measureNumLeaves()
public double measureMaxDepth()
public java.util.Enumeration<java.lang.String> enumerateMeasures()
enumerateMeasures in interface AdditionalMeasureProducerenumerateMeasures in class NearestNeighbourSearchpublic double getMeasure(java.lang.String additionalMeasureName)
getMeasure in interface AdditionalMeasureProducergetMeasure in class NearestNeighbourSearchadditionalMeasureName - the name of the measure to query for its valuejava.lang.IllegalArgumentException - if the named measure is not supportedprotected static void print(java.lang.String s)
s - The string to print.protected static void println(java.lang.String s)
s - The string to print.protected static void print(java.lang.Object o)
o - The object to print.protected static void println(java.lang.Object o)
o - The object to print.protected static void print_space(int s)
s - The number of space characters to print.protected static void print(int depth,
CoverTree.CoverTreeNode top_node)
depth - The depth of top_node.top_node - The node to start printing from.public java.lang.String getRevision()
getRevision in interface RevisionHandlerpublic static void main(java.lang.String[] args)
args - The supplied command line arguments.