rdkit.ML.Data.MLData module¶

classes to be used to help work with data sets

class rdkit.ML.Data.MLData.MLDataSet(data, nVars=None, nPts=None, nPossibleVals=None, qBounds=None, varNames=None, ptNames=None, nResults=1)¶

Bases: object

A data set for holding general data (floats, ints, and strings)

Note: this is intended to be a read-only data structure (i.e. after calling the constructor you cannot touch it)

Constructor

Arguments

data: a list of lists containing the data. The data are copied, so don’t worry
about us overwriting them.

nVars: the number of variables

nPts: the number of points

nPossibleVals: an list containing the number of possible values
for each variable (should contain 0 when not relevant) This is _nVars_ long

qBounds: a list of lists containing quantization bounds for variables
which are to be quantized (note, this class does not quantize the variables itself, it merely stores quantization bounds. an empty sublist indicates no quantization for a given variable This is _nVars_ long

varNames: a list of the names of the variables.
This is _nVars_ long

ptNames: the names (labels) of the individual data points
This is _nPts_ long

nResults: the number of results columns in the data lists. This is usually
1, but can be higher.

AddPoint(pt)¶

AddPoints(pts, names)¶

GetAllData()¶: returns a copy of the data

GetInputData()¶

returns the input data

Note

_inputData_ means the examples without their result fields
(the last _NResults_ entries)

GetNPossibleVals()¶

GetNPts()¶

GetNResults()¶

GetNVars()¶

GetNamedData()¶

returns a list of named examples

Note

a named example is the result of prepending the example
name to the data list

GetPtNames()¶

GetQuantBounds()¶

GetResults()¶: Returns the result fields from each example

GetVarNames()¶

class rdkit.ML.Data.MLData.MLQuantDataSet(data, nVars=None, nPts=None, nPossibleVals=None, qBounds=None, varNames=None, ptNames=None, nResults=1)¶

Bases: MLDataSet

a data set for holding quantized data

Note

this is intended to be a read-only data structure (i.e. after calling the constructor you cannot touch it)

Big differences to MLDataSet

data are stored in a numpy array since they are homogenous

results are assumed to be quantized (i.e. no qBounds entry is required)

Constructor

Arguments

data: a list of lists containing the data. The data are copied, so don’t worry
about us overwriting them.

nVars: the number of variables

nPts: the number of points

nPossibleVals: an list containing the number of possible values
for each variable (should contain 0 when not relevant) This is _nVars_ long

qBounds: a list of lists containing quantization bounds for variables
which are to be quantized (note, this class does not quantize the variables itself, it merely stores quantization bounds. an empty sublist indicates no quantization for a given variable This is _nVars_ long

varNames: a list of the names of the variables.
This is _nVars_ long

ptNames: the names (labels) of the individual data points
This is _nPts_ long

nResults: the number of results columns in the data lists. This is usually
1, but can be higher.

GetAllData()¶: returns a copy of the data

GetInputData()¶

returns the input data

Note

_inputData_ means the examples without their result fields
(the last _NResults_ entries)

GetNamedData()¶

returns a list of named examples

Note

a named example is the result of prepending the example
name to the data list

GetResults()¶: Returns the result fields from each example

rdkit.ML.Data.MLData module¶

Table of Contents

Previous topic

Next topic

This Page