Utils#
Loading#
Load Data from .csv files.
- create_data_obj(filepath, s_column, y_column, additional_to_drop=None)#
Create a ConfigurableDataset from the given file.
- Parameters
filepath (pathlib.Path) – path to a CSV file
s_column (str) – column that represents sensitive attributes
y_column (str) – column that contains lables
additional_to_drop (Optional[List[str]]) – other columns that should be dropped
- Returns
Dataset object
- Return type
ethicml.data.load.ConfigurableDataset
- load_data(dataset, ordered=False)#
Load dataset from its CSV file.
This function only exists for backwards compatibility. Use dataset.load() instead.
- Parameters
dataset (ethicml.data.dataset.Dataset) – dataset object
ordered (bool) – if True, return features such that discrete come first, then continuous
- Returns
DataTuple with dataframes of features, labels and sensitive attributes
- Return type
Lookup#
Lookup tables / switch statements for project level objects.
- available_tabular()#
List of tabular dataset names.
- Return type
List[str]
- get_dataset_obj_by_name(name)#
Given a dataset name, get the corresponding dataset object.
- Parameters
name (str) –
- Return type
Callable[[], ethicml.data.dataset.Dataset]
Other#
Useful methods that are used in some of the data objects.
- class LabelGroup(columns, multiplier=1)#
Bases:
NamedTuple
Definition of a group of columns that should be interpreted as a single label.
Create new instance of LabelGroup(columns, multiplier)
- Parameters
columns (List[str]) –
multiplier (int) –
- __len__()#
Return len(self).
- columns: List[str]#
Alias for field number 0
- count(value, /)#
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)#
Return first index of value.
Raises ValueError if the value is not present.
- multiplier: int#
Alias for field number 1
- deprecated(func)#
This is a decorator which can be used to mark functions as deprecated.
It will result in a warning being emitted when the function is used.
- Parameters
func (ethicml.data.util._F) –
- Return type
Any
- filter_features_by_prefixes(features, prefixes)#
Filter the features by prefixes.
- Parameters
features (Sequence[str]) – list of features names
prefixes (Sequence[str]) – list of prefixes
- Returns
filtered feature names
- Return type
List[str]
- get_discrete_features(all_feats, feats_to_remove, cont_feats)#
Get a list of the discrete features in a dataset.
- Parameters
all_feats (List[str]) – List of all features in the dataset.
feats_to_remove (List[str]) – List of features that aren’t used.
cont_feats (List[str]) – List of continuous features in the dataset.
- Returns
List of features not marked as continuous or to be removed.
- Return type
List[str]
- group_disc_feat_indexes(disc_feat_names, prefix_sep='_')#
Group discrete features names according to the first segment of their name.
Returns a list of their corresponding slices (assumes order is maintained).
- Parameters
disc_feat_names (List[str]) –
prefix_sep (str) –
- Return type
List[slice]
- label_spec_to_feature_list(spec)#
Extract all the feature column names from a dictionary of label specifications.
- Parameters
spec (Mapping[str, ethicml.data.util.LabelGroup]) –
- Return type
List[str]
- simple_spec(label_defs)#
Create label specs for the most common case where columns contain 0s and 1s.
- Parameters
label_defs (Mapping[str, Sequence[str]]) –
- Return type
Mapping[str, ethicml.data.util.LabelGroup]