rdkit.ML.Cluster.ClusterUtils module

utility functions for clustering

rdkit.ML.Cluster.ClusterUtils.FindClusterCentroidFromDists(cluster, dists)
find the point in a cluster which has the smallest summed

Euclidean distance to all others


  • cluster: the cluster to work with

  • dists: the distance matrix to use for the points


  • the index of the centroid point


returns an ordered list of all nodes below cluster

the ordering is done using the lengths of the child nodes


  • cluster: the cluster in question


  • a list of the leaves below this cluster

rdkit.ML.Cluster.ClusterUtils.GetNodesDownToCentroids(cluster, above=1)

returns an ordered list of all nodes below cluster

rdkit.ML.Cluster.ClusterUtils.SplitIntoNClusters(cluster, n, breadthFirst=True)

splits a cluster tree into a set of branches


  • cluster: the root of the cluster tree

  • n: the number of clusters to include in the split

  • breadthFirst: toggles breadth first (vs depth first) cleavage of the cluster tree.


  • a list of sub clusters