rdkit.Chem.AtomPairs.Utils module

rdkit.Chem.AtomPairs.Utils.BitsInCommon(v1, v2)

Returns the number of bits in common between two vectors

Arguments:

  • two vectors (sequences of bit ids)

Returns: an integer

Notes

  • the vectors must be sorted

  • duplicate bit IDs are counted more than once

>>> BitsInCommon( (1,2,3,4,10), (2,4,6) )
2

Here’s how duplicates are handled:

>>> BitsInCommon( (1,2,2,3,4), (2,2,4,5,6) )
3
rdkit.Chem.AtomPairs.Utils.CosineSimilarity(v1, v2)
Implements the Cosine similarity metric.

This is the recommended metric in the LaSSI paper

Arguments:

  • two vectors (sequences of bit ids)

Returns: a float.

Notes

  • the vectors must be sorted

>>> print('%.3f'%CosineSimilarity( (1,2,3,4,10), (2,4,6) ))
0.516
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (2,2,4,5,6) ))
0.714
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (1,2,2,3,4) ))
1.000
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (5,6,7) ))
0.000
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), () ))
0.000
rdkit.Chem.AtomPairs.Utils.DiceSimilarity(v1, v2, bounds=None)
Implements the DICE similarity metric.

This is the recommended metric in both the Topological torsions and Atom pairs papers.

Arguments:

  • two vectors (sequences of bit ids)

Returns: a float.

Notes

  • the vectors must be sorted

>>> DiceSimilarity( (1,2,3), (1,2,3) )
1.0
>>> DiceSimilarity( (1,2,3), (5,6) )
0.0
>>> DiceSimilarity( (1,2,3,4), (1,3,5,7) )
0.5
>>> DiceSimilarity( (1,2,3,4,5,6), (1,3) )
0.5

Note that duplicate bit IDs count multiple times:

>>> DiceSimilarity( (1,1,3,4,5,6), (1,1) )
0.5

but only if they are duplicated in both vectors:

>>> DiceSimilarity( (1,1,3,4,5,6), (1,) )==2./7
True

edge case

>>> DiceSimilarity( (), () )
0.0

and bounds check

>>> DiceSimilarity( (1,1,3,4), (1,1))
0.666...
>>> DiceSimilarity( (1,1,3,4), (1,1), bounds=0.3)
0.666...
>>> DiceSimilarity( (1,1,3,4), (1,1), bounds=0.33)
0.666...
>>> DiceSimilarity( (1,1,3,4,5,6), (1,1), bounds=0.34)
0.0
rdkit.Chem.AtomPairs.Utils.Dot(v1, v2)

Returns the Dot product between two vectors:

Arguments:

  • two vectors (sequences of bit ids)

Returns: an integer

Notes

  • the vectors must be sorted

  • duplicate bit IDs are counted more than once

>>> Dot( (1,2,3,4,10), (2,4,6) )
2

Here’s how duplicates are handled:

>>> Dot( (1,2,2,3,4), (2,2,4,5,6) )
5
>>> Dot( (1,2,2,3,4), (2,4,5,6) )
2
>>> Dot( (1,2,2,3,4), (5,6) )
0
>>> Dot( (), (5,6) )
0
rdkit.Chem.AtomPairs.Utils.ExplainAtomCode(code, branchSubtract=0, includeChirality=False)

Arguments:

  • the code to be considered

  • branchSubtract: (optional) the constant that was subtracted off the number of neighbors before integrating it into the code. This is used by the topological torsions code.

  • includeChirality: (optional) Determines whether or not chirality was included when generating the atom code.

>>> m = Chem.MolFromSmiles('C=CC(=O)O')
>>> code = GetAtomCode(m.GetAtomWithIdx(0))
>>> ExplainAtomCode(code)
('C', 1, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(1))
>>> ExplainAtomCode(code)
('C', 2, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(2))
>>> ExplainAtomCode(code)
('C', 3, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(3))
>>> ExplainAtomCode(code)
('O', 1, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(4))
>>> ExplainAtomCode(code)
('O', 1, 0)

we can do chirality too, that returns an extra element in the tuple:

>>> m = Chem.MolFromSmiles('C[C@H](F)Cl')
>>> code = GetAtomCode(m.GetAtomWithIdx(1))
>>> ExplainAtomCode(code)
('C', 3, 0)
>>> code = GetAtomCode(m.GetAtomWithIdx(1),includeChirality=True)
>>> ExplainAtomCode(code,includeChirality=True)
('C', 3, 0, 'R')

note that if we don’t ask for chirality, we get the right answer even if the atom code was calculated with chirality:

>>> ExplainAtomCode(code)
('C', 3, 0)

non-chiral atoms return ‘’ in the 4th field:

>>> code = GetAtomCode(m.GetAtomWithIdx(0),includeChirality=True)
>>> ExplainAtomCode(code,includeChirality=True)
('C', 1, 0, '')

Obviously switching the chirality changes the results:

>>> m = Chem.MolFromSmiles('C[C@@H](F)Cl')
>>> code = GetAtomCode(m.GetAtomWithIdx(1),includeChirality=True)
>>> ExplainAtomCode(code,includeChirality=True)
('C', 3, 0, 'S')
rdkit.Chem.AtomPairs.Utils.NumPiElectrons(atom)

Returns the number of electrons an atom is using for pi bonding

>>> m = Chem.MolFromSmiles('C=C')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1
>>> m = Chem.MolFromSmiles('C#CC')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
2
>>> NumPiElectrons(m.GetAtomWithIdx(1))
2
>>> m = Chem.MolFromSmiles('O=C=CC')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1
>>> NumPiElectrons(m.GetAtomWithIdx(1))
2
>>> NumPiElectrons(m.GetAtomWithIdx(2))
1
>>> NumPiElectrons(m.GetAtomWithIdx(3))
0
>>> m = Chem.MolFromSmiles('c1ccccc1')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1

FIX: this behaves oddly in these cases:

>>> m = Chem.MolFromSmiles('S(=O)(=O)')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
2
>>> m = Chem.MolFromSmiles('S(=O)(=O)(O)O')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
0

In the second case, the S atom is tagged as sp3 hybridized.