Package elki.clustering.subspace
Class PROCLUS
- java.lang.Object
-
- elki.clustering.AbstractProjectedClustering<Clustering<SubspaceModel>>
-
- elki.clustering.subspace.PROCLUS
-
- All Implemented Interfaces:
elki.Algorithm,ClusteringAlgorithm<Clustering<SubspaceModel>>,SubspaceClusteringAlgorithm<SubspaceModel>
@Title("PROCLUS: PROjected CLUStering") @Description("Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="https://doi.org/10.1145/304181.304188", bibkey="doi:10.1145/304181.304188") public class PROCLUS extends AbstractProjectedClustering<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>The PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces.Reference:
C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park
Fast Algorithms for Projected Clustering
Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).- Since:
- 0.1
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classPROCLUS.DoubleIntIntSimple triple.static classPROCLUS.ParParameterization class.private static classPROCLUS.PROCLUSClusterEncapsulates the attributes of a cluster.
-
Field Summary
Fields Modifier and Type Field Description private static elki.logging.LoggingLOGThe logger for this class.private intm_iMultiplier for the initial number of medoids.private elki.utilities.random.RandomFactoryrndRandom generator-
Fields inherited from class elki.clustering.AbstractProjectedClustering
k, k_i, l
-
-
Constructor Summary
Constructors Constructor Description PROCLUS(int k, int k_i, int l, int m_i, elki.utilities.random.RandomFactory rnd)Java constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private java.util.ArrayList<PROCLUS.PROCLUSCluster>assignPoints(elki.database.ids.ArrayDBIDs m_current, long[][] dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)Assigns the objects to the clusters.private doubleavgDistance(double[] centroid, elki.database.ids.DBIDs objectIDs, elki.database.relation.Relation<? extends elki.data.NumberVector> database, int dimension)Computes the average distance of the objects to the centroid along the specified dimension.private elki.database.ids.DBIDscomputeBadMedoids(elki.database.ids.ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.private long[][]computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)Compute the dimension map.private elki.database.ids.ArrayDBIDscomputeM_current(elki.database.ids.DBIDs m, elki.database.ids.DBIDs m_best, elki.database.ids.DBIDs m_bad, java.util.Random random)Computes the set of medoids in current iteration.private java.util.List<PROCLUS.DoubleIntInt>computeZijs(double[][] averageDistances, int dim)Compute the z_ij values.private doubleevaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)Evaluates the quality of the clusters.private java.util.List<PROCLUS.PROCLUSCluster>finalAssignment(java.util.List<elki.utilities.pairs.Pair<double[],long[]>> dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)Refinement step to assign the objects to the final clusters.private long[][]findDimensions(elki.database.ids.ArrayDBIDs medoids, elki.database.relation.Relation<? extends elki.data.NumberVector> relation, elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)Determines the set of correlated dimensions for each medoid in the specified medoid set.private java.util.List<elki.utilities.pairs.Pair<double[],long[]>>findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, elki.database.relation.Relation<? extends elki.data.NumberVector> database)Refinement step that determines the set of correlated dimensions for each cluster centroid.elki.data.type.TypeInformation[]getInputTypeRestriction()private elki.database.datastore.DataStore<elki.database.ids.DBIDs>getLocalities(elki.database.ids.DBIDs medoids, elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.private elki.database.ids.ArrayDBIDsgreedy(elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.ids.DBIDs sampleSet, int m, java.util.Random random)Returns a piercing set of k medoids from the specified sample set.private elki.database.ids.ArrayDBIDsinitialSet(elki.database.ids.DBIDs sampleSet, int k, java.util.Random random)Returns a set of k elements from the specified sample set.private doublemanhattanSegmentalDistance(elki.data.NumberVector o1, double[] o2, long[] dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.private doublemanhattanSegmentalDistance(elki.data.NumberVector o1, elki.data.NumberVector o2, long[] dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.<V extends elki.data.NumberVector>
Clustering<SubspaceModel>run(elki.database.relation.Relation<V> relation)Performs the PROCLUS algorithm on the given database.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Method Detail
-
getInputTypeRestriction
public elki.data.type.TypeInformation[] getInputTypeRestriction()
- Specified by:
getInputTypeRestrictionin interfaceelki.Algorithm
-
run
public <V extends elki.data.NumberVector> Clustering<SubspaceModel> run(elki.database.relation.Relation<V> relation)
Performs the PROCLUS algorithm on the given database.- Parameters:
relation- Relation to process
-
greedy
private elki.database.ids.ArrayDBIDs greedy(elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.ids.DBIDs sampleSet, int m, java.util.Random random)Returns a piercing set of k medoids from the specified sample set.- Parameters:
distance- the distance functionsampleSet- the sample setm- the number of medoids to be returnedrandom- random number generator- Returns:
- a piercing set of m medoids from the specified sample set
-
initialSet
private elki.database.ids.ArrayDBIDs initialSet(elki.database.ids.DBIDs sampleSet, int k, java.util.Random random)Returns a set of k elements from the specified sample set.- Parameters:
sampleSet- the sample setk- the number of samples to be returnedrandom- random number generator- Returns:
- a set of k elements from the specified sample set
-
computeM_current
private elki.database.ids.ArrayDBIDs computeM_current(elki.database.ids.DBIDs m, elki.database.ids.DBIDs m_best, elki.database.ids.DBIDs m_bad, java.util.Random random)Computes the set of medoids in current iteration.- Parameters:
m- the medoidsm_best- the best set of medoids found so farm_bad- the bad medoidsrandom- random number generator- Returns:
- m_current, the set of medoids in current iteration
-
getLocalities
private elki.database.datastore.DataStore<elki.database.ids.DBIDs> getLocalities(elki.database.ids.DBIDs medoids, elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.- Parameters:
medoids- the ids of the medoidsdistance- the distance function- Returns:
- a mapping of the medoid's id to its locality
-
findDimensions
private long[][] findDimensions(elki.database.ids.ArrayDBIDs medoids, elki.database.relation.Relation<? extends elki.data.NumberVector> relation, elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)Determines the set of correlated dimensions for each medoid in the specified medoid set.- Parameters:
medoids- the set of medoidsrelation- the relation containing the objectsdistance- the distance function- Returns:
- the set of correlated dimensions for each medoid in the specified medoid set
-
findDimensions
private java.util.List<elki.utilities.pairs.Pair<double[],long[]>> findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
Refinement step that determines the set of correlated dimensions for each cluster centroid.- Parameters:
clusters- the list of clustersdatabase- the database containing the objects- Returns:
- the set of correlated dimensions for each specified cluster centroid
-
computeZijs
private java.util.List<PROCLUS.DoubleIntInt> computeZijs(double[][] averageDistances, int dim)
Compute the z_ij values.- Parameters:
averageDistances- Average distancesdim- Dimensions- Returns:
- z_ij values
-
computeDimensionMap
private long[][] computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)
Compute the dimension map.- Parameters:
z_ijs- z_ij valuesdim- Number of dimensionsnumc- Number of clusters- Returns:
- Bitmap of dimensions used
-
assignPoints
private java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints(elki.database.ids.ArrayDBIDs m_current, long[][] dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
Assigns the objects to the clusters.- Parameters:
m_current- Current centersdimensions- set of correlated dimensions for each medoid of the clusterdatabase- the database containing the objects- Returns:
- the assignments of the object to the clusters
-
finalAssignment
private java.util.List<PROCLUS.PROCLUSCluster> finalAssignment(java.util.List<elki.utilities.pairs.Pair<double[],long[]>> dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
Refinement step to assign the objects to the final clusters.- Parameters:
dimensions- pair containing the centroid and the set of correlated dimensions for the centroiddatabase- the database containing the objects- Returns:
- the assignments of the object to the clusters
-
manhattanSegmentalDistance
private double manhattanSegmentalDistance(elki.data.NumberVector o1, elki.data.NumberVector o2, long[] dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.- Parameters:
o1- the first objecto2- the second objectdimensions- the dimensions to be considered- Returns:
- the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
-
manhattanSegmentalDistance
private double manhattanSegmentalDistance(elki.data.NumberVector o1, double[] o2, long[] dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.- Parameters:
o1- the first objecto2- the second objectdimensions- the dimensions to be considered- Returns:
- the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
-
evaluateClusters
private double evaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
Evaluates the quality of the clusters.- Parameters:
clusters- the clusters to be evaluateddimensions- the dimensions associated with each clusterdatabase- the database holding the objects- Returns:
- a measure for the cluster quality
-
avgDistance
private double avgDistance(double[] centroid, elki.database.ids.DBIDs objectIDs, elki.database.relation.Relation<? extends elki.data.NumberVector> database, int dimension)Computes the average distance of the objects to the centroid along the specified dimension.- Parameters:
centroid- the centroidobjectIDs- the set of objects idsdatabase- the database holding the objectsdimension- the dimension for which the average distance is computed- Returns:
- the average distance of the objects to the centroid along the specified dimension
-
computeBadMedoids
private elki.database.ids.DBIDs computeBadMedoids(elki.database.ids.ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.- Parameters:
m_current- Current medoidsclusters- the clustersthreshold- the threshold- Returns:
- the bad medoids
-
-