Datasets#
This module contains items related to data, such as raw csv’s and data objects.
Dataset base#
Data structure for all datasets that come with the framework.
- class Dataset(name, filename_or_path, features, cont_features, sens_attr_spec, class_label_spec, discrete_only, num_samples, discard_non_one_hot=False, map_to_binary=False, s_prefix=None, class_label_prefix=None, discrete_feature_groups=None)#
Bases:
object
Data structure that holds all the information needed to load a given dataset.
- Parameters
discard_non_one_hot (bool) – If some entries in s or y are not correctly one-hot encoded, discard those.
map_to_binary (bool) – If True, convert labels from {-1, 1} to {0, 1}.
name (str) –
filename_or_path (Union[str, pathlib.Path]) –
features (Sequence[str]) –
cont_features (Sequence[str]) –
sens_attr_spec (Union[str, Mapping[str, ethicml.data.util.LabelGroup]]) –
class_label_spec (Union[str, Mapping[str, ethicml.data.util.LabelGroup]]) –
discrete_only (bool) –
num_samples (int) –
s_prefix (Optional[Sequence[str]]) –
class_label_prefix (Optional[Sequence[str]]) –
discrete_feature_groups (Optional[Dict[str, List[str]]]) –
- Return type
None
- __len__()#
Number of elements in the dataset.
- Return type
int
- property class_labels: List[str]#
Get the list of class labels.
- property continuous_features: List[str]#
List of features that are continuous.
- property disc_feature_groups: Optional[Dict[str, List[str]]]#
Dictionary of feature groups.
- property discrete_features: List[str]#
List of features that are discrete.
- expand_labels(label, label_type)#
Expand a label in the form of an index into all the subfeatures.
- Parameters
label (pandas.DataFrame) –
label_type (Literal['s', 'y']) –
- Return type
pandas.DataFrame
- property feature_split: ethicml.data.dataset.FeatureSplit#
Return a feature split dictionary.
This should have separate entries for the features, the labels and the sensitive attributes.
- property features_to_remove: List[str]#
Features that have to be removed from x.
- property filepath: pathlib.Path#
Filepath from which to load the data.
- abstract load(ordered=False, labels_as_features=False)#
Load the dataset.
- Parameters
ordered (bool) –
labels_as_features (bool) –
- Return type
- property name: str#
Name of the dataset.
- property ordered_features: ethicml.data.dataset.FeatureSplit#
Return an order features dictionary.
This should have separate entries for the features, the labels and the sensitive attributes, but the x features are ordered so first are the discrete features, then the continuous.
- property sens_attrs: List[str]#
Get the list of sensitive attributes.
- class FeatureSplit(_typename, _fields=None, /, **kwargs)#
Bases:
dict
A dictionary of the list of columns that belong to the feature groups.
- __len__()#
Return len(self).
- clear() None. Remove all items from D. #
- copy() a shallow copy of D #
- fromkeys(value=None, /)#
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)#
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items #
- keys() a set-like object providing a view on D's keys #
- pop(k[, d]) v, remove specified key and return the corresponding value. #
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()#
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)#
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F. #
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values #
- class LoadableDataset(name, filename_or_path, features, cont_features, sens_attr_spec, class_label_spec, discrete_only, num_samples, discard_non_one_hot=False, map_to_binary=False, s_prefix=None, class_label_prefix=None, discrete_feature_groups=None)#
Bases:
ethicml.data.dataset.Dataset
Dataset that uses the default load function.
- Parameters
name (str) –
filename_or_path (Union[str, pathlib.Path]) –
features (Sequence[str]) –
cont_features (Sequence[str]) –
sens_attr_spec (Union[str, Mapping[str, ethicml.data.util.LabelGroup]]) –
class_label_spec (Union[str, Mapping[str, ethicml.data.util.LabelGroup]]) –
discrete_only (bool) –
num_samples (int) –
discard_non_one_hot (bool) –
map_to_binary (bool) –
s_prefix (Optional[Sequence[str]]) –
class_label_prefix (Optional[Sequence[str]]) –
discrete_feature_groups (Optional[Dict[str, List[str]]]) –
- Return type
None
- __len__()#
Number of elements in the dataset.
- Return type
int
- property class_labels: List[str]#
Get the list of class labels.
- property continuous_features: List[str]#
List of features that are continuous.
- property disc_feature_groups: Optional[Dict[str, List[str]]]#
Dictionary of feature groups.
- property discrete_features: List[str]#
List of features that are discrete.
- expand_labels(label, label_type)#
Expand a label in the form of an index into all the subfeatures.
- Parameters
label (pandas.DataFrame) –
label_type (Literal['s', 'y']) –
- Return type
pandas.DataFrame
- property feature_split: ethicml.data.dataset.FeatureSplit#
Return a feature split dictionary.
This should have separate entries for the features, the labels and the sensitive attributes.
- property features_to_remove: List[str]#
Features that have to be removed from x.
- property filepath: pathlib.Path#
Filepath from which to load the data.
- load(ordered=False, labels_as_features=False)#
Load dataset from its CSV file.
- Parameters
ordered (bool) – if True, return features such that discrete come first, then continuous
labels_as_features (bool) – if True, the s and y labels are included in the x features
- Returns
DataTuple with dataframes of features, labels and sensitive attributes
- Return type
- load_aif()#
Load the dataset as an AIF360 dataset.
Experimental. Requires the aif360 library.
Ignores the type check as the return type is not yet defined.
- property name: str#
Name of the dataset.
- property ordered_features: ethicml.data.dataset.FeatureSplit#
Return an order features dictionary.
This should have separate entries for the features, the labels and the sensitive attributes, but the x features are ordered so first are the discrete features, then the continuous.
- property sens_attrs: List[str]#
Get the list of sensitive attributes.