rdkit.SimDivFilters.SimilarityPickers module

class rdkit.SimDivFilters.SimilarityPickers.GenericPicker

Bases: object

class rdkit.SimDivFilters.SimilarityPickers.SpreadPicker(numToPick=10, probeFps=None, dataSet=None, simMetric=<Boost.Python.function object>, expectPickles=True, onlyNames=False)

Bases: GenericPicker

A class for picking the best matches across a library

Connect to a database:

>>> from rdkit import Chem
>>> from rdkit import RDConfig
>>> import os.path
>>> from rdkit.Dbase.DbConnection import DbConnect
>>> dbName = RDConfig.RDTestDatabase
>>> conn = DbConnect(dbName,'simple_mols1')
>>> [x.upper() for x in conn.GetColumnNames()]
['SMILES', 'ID']
>>> mols = []
>>> for smi,id in conn.GetData():
...   mol = Chem.MolFromSmiles(str(smi))
...   mol.SetProp('_Name',str(id))
...   mols.append(mol)
>>> len(mols)

Calculate fingerprints:

>>> probefps = []
>>> for mol in mols:
...   fp = Chem.RDKFingerprint(mol)
...   fp._id = mol.GetProp('_Name')
...   probefps.append(fp)

Start by finding the top matches for a single probe. This ether should pull other ethers from the db:

>>> mol = Chem.MolFromSmiles('COC')
>>> probeFp = Chem.RDKFingerprint(mol)
>>> picker = SpreadPicker(numToPick=2,probeFps=[probeFp],dataSet=probefps)
>>> len(picker)
>>> fp,score = picker[0]
>>> id = fp._id
>>> str(id)
>>> score

The results come back in order:

>>> fp,score = picker[1]
>>> id = fp._id
>>> str(id)

Now find the top matches for 2 probes. We’ll get one ether and one acid:

>>> fps = []
>>> fps.append(Chem.RDKFingerprint(Chem.MolFromSmiles('COC')))
>>> fps.append(Chem.RDKFingerprint(Chem.MolFromSmiles('CC(=O)O')))
>>> picker = SpreadPicker(numToPick=3,probeFps=fps,dataSet=probefps)
>>> len(picker)
>>> fp,score = picker[0]
>>> id = fp._id
>>> str(id)
>>> score
>>> fp,score = picker[1]
>>> id = fp._id
>>> str(id)
>>> score
>>> fp,score = picker[2]
>>> id = fp._id
>>> str(id)

dataSet should be a sequence of BitVectors or, if expectPickles is False, a set of strings that can be converted to bit vectors

MakePicks(force=False, silent=False)
class rdkit.SimDivFilters.SimilarityPickers.TopNOverallPicker(numToPick=10, probeFps=None, dataSet=None, simMetric=<Boost.Python.function object>)

Bases: GenericPicker

A class for picking the top N overall best matches across a library

Connect to a database and build molecules:

>>> from rdkit import Chem
>>> from rdkit import RDConfig
>>> import os.path
>>> from rdkit.Dbase.DbConnection import DbConnect
>>> dbName = RDConfig.RDTestDatabase
>>> conn = DbConnect(dbName,'simple_mols1')
>>> [x.upper() for x in conn.GetColumnNames()]
['SMILES', 'ID']
>>> mols = []
>>> for smi,id in conn.GetData():
...   mol = Chem.MolFromSmiles(str(smi))
...   mol.SetProp('_Name',str(id))
...   mols.append(mol)
>>> len(mols)

Calculate fingerprints:

>>> probefps = []
>>> for mol in mols:
...   fp = Chem.RDKFingerprint(mol)
...   fp._id = mol.GetProp('_Name')
...   probefps.append(fp)

Start by finding the top matches for a single probe. This ether should pull other ethers from the db:

>>> mol = Chem.MolFromSmiles('COC')
>>> probeFp = Chem.RDKFingerprint(mol)
>>> picker = TopNOverallPicker(numToPick=2,probeFps=[probeFp],dataSet=probefps)
>>> len(picker)
>>> fp,score = picker[0]
>>> id = fp._id
>>> str(id)
>>> score

The results come back in order:

>>> fp,score = picker[1]
>>> id = fp._id
>>> str(id)

Now find the top matches for 2 probes. We’ll get one ether and one acid:

>>> fps = []
>>> fps.append(Chem.RDKFingerprint(Chem.MolFromSmiles('COC')))
>>> fps.append(Chem.RDKFingerprint(Chem.MolFromSmiles('CC(=O)O')))
>>> picker = TopNOverallPicker(numToPick=3,probeFps=fps,dataSet=probefps)
>>> len(picker)
>>> fp,score = picker[0]
>>> id = fp._id
>>> str(id)
>>> fp,score = picker[1]
>>> id = fp._id
>>> str(id)
>>> score
>>> fp,score = picker[2]
>>> id = fp._id
>>> str(id)

dataSet should be a sequence of BitVectors
