Utils#

Loading#

Load Data from .csv files.

create_data_obj(filepath, s_column, y_column, additional_to_drop=None)#

Create a ConfigurableDataset from the given file.

Parameters
  • filepath (pathlib.Path) – path to a CSV file

  • s_column (str) – column that represents sensitive attributes

  • y_column (str) – column that contains lables

  • additional_to_drop (Optional[List[str]]) – other columns that should be dropped

Returns

Dataset object

Return type

ethicml.data.load.ConfigurableDataset

load_data(dataset, ordered=False)#

Load dataset from its CSV file.

This function only exists for backwards compatibility. Use dataset.load() instead.

Parameters
  • dataset (ethicml.data.dataset.Dataset) – dataset object

  • ordered (bool) – if True, return features such that discrete come first, then continuous

Returns

DataTuple with dataframes of features, labels and sensitive attributes

Return type

ethicml.utility.data_structures.DataTuple

Lookup#

Lookup tables / switch statements for project level objects.

available_tabular()#

List of tabular dataset names.

Return type

List[str]

get_dataset_obj_by_name(name)#

Given a dataset name, get the corresponding dataset object.

Parameters

name (str) –

Return type

Callable[[], ethicml.data.dataset.Dataset]

Other#

Useful methods that are used in some of the data objects.

class LabelGroup(columns, multiplier=1)#

Bases: NamedTuple

Definition of a group of columns that should be interpreted as a single label.

Create new instance of LabelGroup(columns, multiplier)

Parameters
  • columns (List[str]) –

  • multiplier (int) –

__len__()#

Return len(self).

columns: List[str]#

Alias for field number 0

count(value, /)#

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

multiplier: int#

Alias for field number 1

deprecated(func)#

This is a decorator which can be used to mark functions as deprecated.

It will result in a warning being emitted when the function is used.

Parameters

func (ethicml.data.util._F) –

Return type

Any

filter_features_by_prefixes(features, prefixes)#

Filter the features by prefixes.

Parameters
  • features (Sequence[str]) – list of features names

  • prefixes (Sequence[str]) – list of prefixes

Returns

filtered feature names

Return type

List[str]

get_discrete_features(all_feats, feats_to_remove, cont_feats)#

Get a list of the discrete features in a dataset.

Parameters
  • all_feats (List[str]) – List of all features in the dataset.

  • feats_to_remove (List[str]) – List of features that aren’t used.

  • cont_feats (List[str]) – List of continuous features in the dataset.

Returns

List of features not marked as continuous or to be removed.

Return type

List[str]

group_disc_feat_indexes(disc_feat_names, prefix_sep='_')#

Group discrete features names according to the first segment of their name.

Returns a list of their corresponding slices (assumes order is maintained).

Parameters
  • disc_feat_names (List[str]) –

  • prefix_sep (str) –

Return type

List[slice]

label_spec_to_feature_list(spec)#

Extract all the feature column names from a dictionary of label specifications.

Parameters

spec (Mapping[str, ethicml.data.util.LabelGroup]) –

Return type

List[str]

simple_spec(label_defs)#

Create label specs for the most common case where columns contain 0s and 1s.

Parameters

label_defs (Mapping[str, Sequence[str]]) –

Return type

Mapping[str, ethicml.data.util.LabelGroup]