Package ML :: Package Cluster :: Module Butina
[hide private]
[frames] | no frames]

Module Butina

source code

Implementation of the clustering algorithm published in:
Butina JCICS 39 747-750 (1999)



Functions [hide private]
 
ClusterData(data, nPts, distThresh, isDistData=False)
clusters the data points passed in and returns the list of clusters **Arguments** - data: a list of lists (or array, or whatever) with the input data (see discussion of _isDistData_ argument for the exception) - nPts: the number of points to be used - distThresh: elements within this range of each other are considered to be neighbors - isDistData: set this toggle when the data passed in is a distance matrix.
source code
Variables [hide private]
  logger = RDLogger.logger()
Function Details [hide private]

ClusterData(data, nPts, distThresh, isDistData=False)

source code 
clusters the data points passed in and returns the list of clusters

**Arguments**

  - data: a list of lists (or array, or whatever) with the input
    data (see discussion of _isDistData_ argument for the exception)

  - nPts: the number of points to be used

  - distThresh: elements within this range of each other are considered
    to be neighbors            

  - isDistData: set this toggle when the data passed in is a
      distance matrix.  The distance matrix should be stored
      symmetrically so that _LookupDist (above) can retrieve
      the results:
        for i<j: d_ij = dists[j*(j-1)/2 + i]

**Returns**

  - a tuple of tuples containing information about the clusters:
     ( (cluster1_elem1, cluster1_elem2, ...),
       (cluster2_elem1, cluster2_elem2, ...),
       ...
     )  
     The first element for each cluster is its centroid.