
This module contains items related to data, such as raw csv’s and data objects.

Dataset base#

Data structure for all datasets that come with the framework.

class Dataset(name, filename_or_path, features, cont_features, sens_attr_spec, class_label_spec, discrete_only, num_samples, discard_non_one_hot=False, map_to_binary=False, s_prefix=None, class_label_prefix=None, discrete_feature_groups=None)#

Bases: object

Data structure that holds all the information needed to load a given dataset.

  • discard_non_one_hot (bool) – If some entries in s or y are not correctly one-hot encoded, discard those.

  • map_to_binary (bool) – If True, convert labels from {-1, 1} to {0, 1}.

  • name (str) –

  • filename_or_path (Union[str, pathlib.Path]) –

  • features (Sequence[str]) –

  • cont_features (Sequence[str]) –

  • sens_attr_spec (Union[str, Mapping[str,]]) –

  • class_label_spec (Union[str, Mapping[str,]]) –

  • discrete_only (bool) –

  • num_samples (int) –

  • s_prefix (Optional[Sequence[str]]) –

  • class_label_prefix (Optional[Sequence[str]]) –

  • discrete_feature_groups (Optional[Dict[str, List[str]]]) –

Return type



Number of elements in the dataset.

Return type


property class_labels: List[str]#

Get the list of class labels.

property continuous_features: List[str]#

List of features that are continuous.

property disc_feature_groups: Optional[Dict[str, List[str]]]#

Dictionary of feature groups.

property discrete_features: List[str]#

List of features that are discrete.

expand_labels(label, label_type)#

Expand a label in the form of an index into all the subfeatures.

  • label (pandas.DataFrame) –

  • label_type (Literal['s', 'y']) –

Return type


property feature_split:

Return a feature split dictionary.

This should have separate entries for the features, the labels and the sensitive attributes.

property features_to_remove: List[str]#

Features that have to be removed from x.

property filepath: pathlib.Path#

Filepath from which to load the data.

abstract load(ordered=False, labels_as_features=False)#

Load the dataset.

  • ordered (bool) –

  • labels_as_features (bool) –

Return type


property name: str#

Name of the dataset.

property ordered_features:

Return an order features dictionary.

This should have separate entries for the features, the labels and the sensitive attributes, but the x features are ordered so first are the discrete features, then the continuous.

property sens_attrs: List[str]#

Get the list of sensitive attributes.

class FeatureSplit(_typename, _fields=None, /, **kwargs)#

Bases: dict

A dictionary of the list of columns that belong to the feature groups.


Return len(self).

clear() None.  Remove all items from D.#
copy() a shallow copy of D#
fromkeys(value=None, /)#

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
pop(k[, d]) v, remove specified key and return the corresponding value.#

If the key is not found, return the default if given; otherwise, raise a KeyError.


Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.#

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values#
class LoadableDataset(name, filename_or_path, features, cont_features, sens_attr_spec, class_label_spec, discrete_only, num_samples, discard_non_one_hot=False, map_to_binary=False, s_prefix=None, class_label_prefix=None, discrete_feature_groups=None)#


Dataset that uses the default load function.

  • name (str) –

  • filename_or_path (Union[str, pathlib.Path]) –

  • features (Sequence[str]) –

  • cont_features (Sequence[str]) –

  • sens_attr_spec (Union[str, Mapping[str,]]) –

  • class_label_spec (Union[str, Mapping[str,]]) –

  • discrete_only (bool) –

  • num_samples (int) –

  • discard_non_one_hot (bool) –

  • map_to_binary (bool) –

  • s_prefix (Optional[Sequence[str]]) –

  • class_label_prefix (Optional[Sequence[str]]) –

  • discrete_feature_groups (Optional[Dict[str, List[str]]]) –

Return type



Number of elements in the dataset.

Return type


property class_labels: List[str]#

Get the list of class labels.

property continuous_features: List[str]#

List of features that are continuous.

property disc_feature_groups: Optional[Dict[str, List[str]]]#

Dictionary of feature groups.

property discrete_features: List[str]#

List of features that are discrete.

expand_labels(label, label_type)#

Expand a label in the form of an index into all the subfeatures.

  • label (pandas.DataFrame) –

  • label_type (Literal['s', 'y']) –

Return type


property feature_split:

Return a feature split dictionary.

This should have separate entries for the features, the labels and the sensitive attributes.

property features_to_remove: List[str]#

Features that have to be removed from x.

property filepath: pathlib.Path#

Filepath from which to load the data.

load(ordered=False, labels_as_features=False)#

Load dataset from its CSV file.

  • ordered (bool) – if True, return features such that discrete come first, then continuous

  • labels_as_features (bool) – if True, the s and y labels are included in the x features


DataTuple with dataframes of features, labels and sensitive attributes

Return type



Load the dataset as an AIF360 dataset.

Experimental. Requires the aif360 library.

Ignores the type check as the return type is not yet defined.

property name: str#

Name of the dataset.

property ordered_features:

Return an order features dictionary.

This should have separate entries for the features, the labels and the sensitive attributes, but the x features are ordered so first are the discrete features, then the continuous.

property sens_attrs: List[str]#

Get the list of sensitive attributes.