rdkit.ML.Data.MLData module

classes to be used to help work with data sets

class rdkit.ML.Data.MLData.MLDataSet(data, nVars=None, nPts=None, nPossibleVals=None, qBounds=None, varNames=None, ptNames=None, nResults=1)

Bases: object

A data set for holding general data (floats, ints, and strings)

Note

this is intended to be a read-only data structure (i.e. after calling the constructor you cannot touch it)

Constructor

Arguments

  • data: a list of lists containing the data. The data are copied, so don’t worry

    about us overwriting them.

  • nVars: the number of variables

  • nPts: the number of points

  • nPossibleVals: an list containing the number of possible values

    for each variable (should contain 0 when not relevant) This is _nVars_ long

  • qBounds: a list of lists containing quantization bounds for variables

    which are to be quantized (note, this class does not quantize the variables itself, it merely stores quantization bounds. an empty sublist indicates no quantization for a given variable This is _nVars_ long

  • varNames: a list of the names of the variables.

    This is _nVars_ long

  • ptNames: the names (labels) of the individual data points

    This is _nPts_ long

  • nResults: the number of results columns in the data lists. This is usually

    1, but can be higher.

AddPoint(pt)
AddPoints(pts, names)
GetAllData()

returns a copy of the data

GetInputData()

returns the input data

Note

_inputData_ means the examples without their result fields

(the last _NResults_ entries)

GetNPossibleVals()
GetNPts()
GetNResults()
GetNVars()
GetNamedData()

returns a list of named examples

Note

a named example is the result of prepending the example

name to the data list

GetPtNames()
GetQuantBounds()
GetResults()

Returns the result fields from each example

GetVarNames()
class rdkit.ML.Data.MLData.MLQuantDataSet(data, nVars=None, nPts=None, nPossibleVals=None, qBounds=None, varNames=None, ptNames=None, nResults=1)

Bases: MLDataSet

a data set for holding quantized data

Note

this is intended to be a read-only data structure (i.e. after calling the constructor you cannot touch it)

Big differences to MLDataSet

  1. data are stored in a numpy array since they are homogenous

  2. results are assumed to be quantized (i.e. no qBounds entry is required)

Constructor

Arguments

  • data: a list of lists containing the data. The data are copied, so don’t worry

    about us overwriting them.

  • nVars: the number of variables

  • nPts: the number of points

  • nPossibleVals: an list containing the number of possible values

    for each variable (should contain 0 when not relevant) This is _nVars_ long

  • qBounds: a list of lists containing quantization bounds for variables

    which are to be quantized (note, this class does not quantize the variables itself, it merely stores quantization bounds. an empty sublist indicates no quantization for a given variable This is _nVars_ long

  • varNames: a list of the names of the variables.

    This is _nVars_ long

  • ptNames: the names (labels) of the individual data points

    This is _nPts_ long

  • nResults: the number of results columns in the data lists. This is usually

    1, but can be higher.

GetAllData()

returns a copy of the data

GetInputData()

returns the input data

Note

_inputData_ means the examples without their result fields

(the last _NResults_ entries)

GetNamedData()

returns a list of named examples

Note

a named example is the result of prepending the example

name to the data list

GetResults()

Returns the result fields from each example