rdkit.ML.InfoTheory.BitRank module

Functionality for ranking bits using info gains

Definitions used in this module

  • sequence: an object capable of containing other objects which supports __getitem__() and __len__(). Examples of these include lists, tuples, and Numeric arrays.

  • IntVector: an object containing integers which supports __getitem__() and

    __len__(). Examples include lists, tuples, Numeric Arrays, and BitVects.

NOTE: Neither sequences nor IntVectors need to support item assignment.

It is perfectly acceptable for them to be read-only, so long as they are random-access.

rdkit.ML.InfoTheory.BitRank.AnalyzeSparseVects(bitVects, actVals)



  • bitVects: a sequence containing SBVs

  • actVals: a sequence


a list of floats


  • these need to be bit vects and binary activities

rdkit.ML.InfoTheory.BitRank.CalcInfoGains(bitVects, actVals, nPossibleActs, nPossibleBitVals=2)

Calculates the information gain for a set of points and activity values


  • bitVects: a sequence containing IntVectors

  • actVals: a sequence

  • nPossibleActs: the (integer) number of possible activity values.

  • nPossibleBitVals: (optional) if specified, this integer provides the maximum value attainable by the (increasingly inaccurately named) bits in _bitVects_.


a list of floats

rdkit.ML.InfoTheory.BitRank.FormCounts(bitVects, actVals, whichBit, nPossibleActs, nPossibleBitVals=2)

generates the counts matrix for a particular bit


  • bitVects: a sequence containing IntVectors

  • actVals: a sequence

  • whichBit: an integer, the bit number to use.

  • nPossibleActs: the (integer) number of possible activity values.

  • nPossibleBitVals: (optional) if specified, this integer provides the maximum value attainable by the (increasingly inaccurately named) bits in _bitVects_.


a Numeric array with the counts


This is really intended for internal use.

rdkit.ML.InfoTheory.BitRank.RankBits(bitVects, actVals, nPossibleBitVals=2, metricFunc=<function CalcInfoGains>)

Rank a set of bits according to a metric function


  • bitVects: a sequence containing IntVectors

  • actVals: a sequence

  • nPossibleBitVals: (optional) if specified, this integer provides the maximum value attainable by the (increasingly inaccurately named) bits in _bitVects_.

  • metricFunc: (optional) the metric function to be used. See _CalcInfoGains()_ for a description of the signature of this function.


A 2-tuple containing:

  • the relative order of the bits (a list of ints)

  • the metric calculated for each bit (a list of floats)

rdkit.ML.InfoTheory.BitRank.SparseRankBits(bitVects, actVals, metricFunc=<function AnalyzeSparseVects>)

Rank a set of bits according to a metric function


  • bitVects: a sequence containing SBVs

  • actVals: a sequence

  • metricFunc: (optional) the metric function to be used. See _SparseCalcInfoGains()_ for a description of the signature of this function.


A 2-tuple containing:

  • the relative order of the bits (a list of ints)

  • the metric calculated for each bit (a list of floats)


  • these need to be bit vects and binary activities