rdkit.Chem.rdRascalMCES module

Module containing implementation of RASCAL Maximum Common Edge Substructure algorithm.

rdkit.Chem.rdRascalMCES.FindMCES((rdkit.Chem.rdchem.Mol)mol1, (rdkit.Chem.rdchem.Mol)mol2[, (rdkit.Chem.rdMolDescriptors.AtomPairsParameters)opts=None]) list :

Find one or more MCESs between the 2 molecules given. Returns a list of RascalResult objects.- mol1- mol2 The two molecules for which to find the MCES- opts Optional RascalOptions object changing the default run mode.

C++ signature :

boost::python::list FindMCES(RDKit::ROMol,RDKit::ROMol [,boost::python::api::object=None])

rdkit.Chem.rdRascalMCES.RascalButinaCluster((rdkit.Chem.rdMolDescriptors.AtomPairsParameters)mols[, (rdkit.Chem.rdMolDescriptors.AtomPairsParameters)opts=None]) list :

Use the RASCAL MCES similarity metric to do Butina clustering (Butina JCICS 39 747-750 (1999)). Returns a list of lists of molecules, each inner list being a cluster. The last cluster is all the molecules that didn’t fit into another cluster (the singletons).- mols List of molecules to be clustered- opts Optional RascalOptions object changing the default run mode.

C++ signature :

boost::python::list RascalButinaCluster(boost::python::api::object [,boost::python::api::object=None])

rdkit.Chem.rdRascalMCES.RascalCluster((rdkit.Chem.rdMolDescriptors.AtomPairsParameters)mols[, (rdkit.Chem.rdMolDescriptors.AtomPairsParameters)opts=None]) list :

Use the RASCAL MCES similarity metric to do fuzzy clustering. Returns a list of lists of molecules, each inner list being a cluster. The last cluster is all the molecules that didn’t fit into another cluster (the singletons).- mols List of molecules to be clustered- opts Optional RascalOptions object changing the default run mode.

C++ signature :

boost::python::list RascalCluster(boost::python::api::object [,boost::python::api::object=None])

class rdkit.Chem.rdRascalMCES.RascalClusterOptions((object)arg1) None :

Bases: instance

RASCAL Cluster Options. Most of these pertain to RascalCluster calculations. Only similarityCutoff is used by RascalButinaCluster.

C++ signature :

void __init__(_object*)

property a

The penalty score for each unconnected component in the MCES. Default=0.05.

property b

The weight of matched bonds over matched atoms. Default=2.

property clusterMergeSim

Two clusters are merged if the fraction of molecules they have in common is greater than this. Default=0.6.

property maxNumFrags

The maximum number of fragments allowed in the MCES for each pair of molecules. Default=2. So that the MCES isn’t a lot of small fragments scattered around the molecules giving an inflated estimate of similarity.

property minFragSize

The minimum number of atoms in a fragment for it to be included in the MCES. Default=3.

property minIntraClusterSim

Two pairs of molecules are included in the same cluster if the similarity between their MCESs is greater than this. Default=0.9.

property numThreads

Number of threads to use during clustering. Default=-1 means all the hardware threads less one.

property similarityCutoff

Similarity cutoff for molecules to be in the same cluster. Between 0.0 and 1.0, default=0.7.

class rdkit.Chem.rdRascalMCES.RascalOptions((object)arg1) None :

Bases: instance

RASCAL Options

C++ signature :

void __init__(_object*)

property allBestMCESs

If True, reports all MCESs found of the same maximum size. Default False means just report the first found.

property completeAromaticRings

If True (default), partial aromatic rings won’t be returned.

property completeSmallestRings

If True (default is False), only complete rings present in both input molecule’s RingInfo will be returned. Implies completeAromaticRings and ringMatchesRingOnly.

property equivalentAtoms

SMARTS strings defining atoms that shouldbe considered equivalent. e.g.[F,Cl,Br,I] so all halogens will match each other.Space-separated list allowing more than 1class of equivalent atoms.

property exactConnectionsMatch

If True (default is False), atoms will only match atoms if they have the same number of explicit connections. E.g. the central atom of C(C)(C) won’t match either atom in CC

property ignoreAtomAromaticity

If True, matches atoms solely on atomic number. If False, will treat aromatic and aliphatic atoms as different. Default=True.

property ignoreBondOrders

If True, will treat all bonds as the same, irrespective of order. Default=False.

property maxBestMCESs

Some pathological cases produce huge numbers of equivalent solutions that can crash the program due to memory depletion. This caps the number of such solutions to prevent this happening. Default=10000.

property maxBondMatchPairs

Too many matching bond (vertex) pairs can cause the process to run out of memory. The default of 1000 is fairly safe. Increase with caution, as memory use increases with the square of this number.

property maxFragSeparation

Maximum number of bonds between fragments in the MCES for both to be reported. Default -1 means no maximum. If exceeded, the smaller fragment will be removed.

property minCliqueSize

Normally, the minimum clique size is specified via the similarityThreshold. Sometimes it’s more convenient to specify it directly. If this is > 0, it will over-ride the similarityThreshold. Note that this refers to the minimum number of BONDS in the MCES. Default=0.

property minFragSize

Imposes a minimum on the number of atoms in a fragment that may be part of the MCES. Default -1 means no minimum.

property returnEmptyMCES

If the estimated similarity between the 2 molecules doesn’t meet the similarityThreshold, no results are returned. If you want to know what the estimates were, set this to True, and examine the tier1Sim and tier2Sim properties of the result then returned.

property ringMatchesRingOnly

If True (default is False), ring bonds won’t match non-ring bonds.

property similarityThreshold

Threshold below which MCES won’t be run. Between 0.0 and 1.0, default=0.7.

property singleLargestFrag

Return the just single largest fragment of the MCES. It is equivalent to running with allBestMCEs=True, finding the result with the largest largestFragmentSize, and calling its largestFragmentOnly method. This option may not produce the largest possible single fragment that the molecules have in common. If you definitely want that you may be better off using rdFMCS.

property timeout

Maximum time (in seconds) to spend on an individual MCESs determination. Default 60, -1 means no limit.

class rdkit.Chem.rdRascalMCES.RascalResult

Bases: instance

Used to return RASCAL MCES results.

Raises an exception This class cannot be instantiated from Python

atomMatches((RascalResult)self) list :

Likewise for atoms.

C++ signature :

boost::python::list atomMatches(RDKit::RascalMCES::RascalResult)

bondMatches((RascalResult)self) list :

A function returning a list of list of tuples, each inner list containing the matching bonds in the MCES as tuples of bond indices from mol1 and mol2

C++ signature :

boost::python::list bondMatches(RDKit::RascalMCES::RascalResult)

largestFragmentOnly((RascalResult)self) None :

Function that cuts the MCES down to the single largest frag. This cannot be undone.

C++ signature :

void largestFragmentOnly(RDKit::RascalMCES::RascalResult {lvalue})

property largestFragmentSize

Number of atoms in largest fragment.

property numFragments

Number of fragments in MCES.

property similarity

Johnson similarity between 2 molecules.

property smartsString

SMARTS string defining the MCES.

property tier1Sim

The tier 1 similarity estimate.

property tier2Sim

The tier 2 similarity estimate.

property timedOut

Whether it timed out.