Utils#

This module contains kind of useful things that don’t really belong anywhere else (just yet).

Classes:

Activation

Base class for decision functions.

DataTuple

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Heaviside

Decision function that accepts predictions with score of 50% or above.

Prediction

Prediction of an algorithm.

ResultsAggregator

Aggregate results.

SoftPrediction

Prediction of an algorithm that makes soft predictions.

TestTuple

A tuple of dataframes for the features and the sensitive attribute.

TrainTestPair

2-Tuple of train and test data.

Functions:

aggregate_results

Aggregate results over the repeats.

concat_dt

Concatenate the data tuples in the given list.

concat_tt

Concatenate the test tuples in the given list.

filter_and_map_results

Filter entries and change the index with a mapping.

filter_results

Filter the entries based on the given values.

make_results

Initialise Results object.

map_over_results_index

Change the values of the index with a transformation function.

shuffle_df

Shuffle a given dataframe.

undo_one_hot

Undo one-hot encoding.

class Activation#

Bases: abc.ABC

Base class for decision functions.

abstract apply(soft_output)#

Apply the decision function to a soft prediction.

Parameters

soft_output (numpy.ndarray) – soft prediction (i.e. a probability or logits)

Returns

decision

Return type

numpy.ndarray

abstract get_name()#

Name of activation function.

Return type

str

class DataTuple(x, s, y, name=None)#

Bases: ethicml.utility.data_structures.TestTuple

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Parameters
  • x (pd.DataFrame) – input features

  • s (pd.DataFrame) – sensitive attributes

  • y (pd.DataFrame) – class labels

  • name (Optional[str]) – optional name of the dataset

Make a DataTuple.

__len__()#

Overwrite __len__ magic method.

Return type

int

apply_to_joined_df(mapper)#

Concatenate the dataframes in the DataTuple and then apply a function to it.

classmethod from_npz(data_path)#

Load data tuple from npz file.

Parameters

data_path (pathlib.Path) –

Return type

ethicml.utility.data_structures.DataTuple

get_subset(num=500)#

Get the first elements of the dataset.

Parameters

num (int) – how many samples to take for subset

Returns

subset of training data

Return type

ethicml.utility.data_structures.DataTuple

property name: Optional[str]#

Getter for name property.

remove_y()#

Convert the DataTuple instance to a TestTuple instance.

Return type

ethicml.utility.data_structures.TestTuple

replace(*, x=None, s=None, name=None, y=None)#

Create a copy of the DataTuple but change the given values.

Parameters
  • x (Optional[pd.DataFrame]) –

  • s (Optional[pd.DataFrame]) –

  • name (Optional[str]) –

  • y (Optional[pd.DataFrame]) –

Return type

DataTuple

property s: pandas.DataFrame#

Getter for property s.

to_npz(data_path)#

Save DataTuple as an npz file.

Parameters

data_path (pathlib.Path) –

Return type

None

property x: pandas.DataFrame#

Getter for property x.

property y: pandas.DataFrame#

Getter for property y.

class Heaviside#

Bases: ethicml.utility.activation.Activation

Decision function that accepts predictions with score of 50% or above.

apply(soft_output)#

Apply the decision function to each element of an ndarray.

Parameters

soft_output (numpy.ndarray) –

Return type

numpy.ndarray

get_name()#

Getter for name of decision function.

Return type

str

class Prediction(hard, info=None)#

Bases: object

Prediction of an algorithm.

Make a prediction obj.

Parameters
  • hard (pd.Series) –

  • info (Optional[Dict[str, float]]) –

__len__()#

Length of the predictions object.

Return type

int

static from_npz(npz_path)#

Load prediction from npz file.

Parameters

npz_path (pathlib.Path) –

Return type

ethicml.utility.data_structures.Prediction

property hard: pd.Series#

Hard predictions (e.g. 0 and 1).

property info: Dict[str, float]#

Additional info about the prediction.

to_npz(npz_path)#

Save prediction as npz file.

Parameters

npz_path (pathlib.Path) –

Return type

None

class ResultsAggregator(initial=None)#

Bases: object

Aggregate results.

Init results aggregator obj.

Parameters

initial (Optional[pd.DataFrame]) –

append_df(data_frame, prepend=False)#

Append (or prepend) a DataFrame to this object.

Parameters
  • data_frame (pandas.DataFrame) –

  • prepend (bool) –

Return type

None

append_from_csv(csv_file, prepend=False)#

Append results from a CSV file.

Parameters
  • csv_file (pathlib.Path) –

  • prepend (bool) –

Return type

bool

property results: ethicml.utility.data_structures.Results#

Results object over which this class is aggregating.

save_as_csv(file_path)#

Save to csv.

Parameters

file_path (pathlib.Path) –

Return type

None

class SoftPrediction(soft, info=None)#

Bases: ethicml.utility.data_structures.Prediction

Prediction of an algorithm that makes soft predictions.

Make a soft prediction object.

Parameters
  • soft (pd.Series) –

  • info (Optional[Dict[str, float]]) –

__len__()#

Length of the predictions object.

Return type

int

static from_npz(npz_path)#

Load prediction from npz file.

Parameters

npz_path (pathlib.Path) –

Return type

ethicml.utility.data_structures.Prediction

property hard: pd.Series#

Hard predictions (e.g. 0 and 1).

property info: Dict[str, float]#

Additional info about the prediction.

property soft: pd.Series#

Soft predictions (e.g. 0.2 and 0.8).

to_npz(npz_path)#

Save prediction as npz file.

Parameters

npz_path (pathlib.Path) –

Return type

None

class TestTuple(x, s, name=None)#

Bases: object

A tuple of dataframes for the features and the sensitive attribute.

Make a TestTuple.

Parameters
  • x (pd.DataFrame) –

  • s (pd.DataFrame) –

  • name (Optional[str]) –

classmethod from_npz(data_path)#

Load test tuple from npz file.

Parameters

data_path (pathlib.Path) –

Return type

ethicml.utility.data_structures.TestTuple

property name: Optional[str]#

Getter for name property.

replace(*, x=None, s=None, name=None)#

Create a copy of the TestTuple but change the given values.

Parameters
  • x (Optional[pd.DataFrame]) –

  • s (Optional[pd.DataFrame]) –

  • name (Optional[str]) –

Return type

TestTuple

property s: pandas.DataFrame#

Getter for property s.

to_npz(data_path)#

Save TestTuple as an npz file.

Parameters

data_path (pathlib.Path) –

Return type

None

property x: pandas.DataFrame#

Getter for property x.

class TrainTestPair(train, test)#

Bases: NamedTuple

2-Tuple of train and test data.

Create new instance of TrainTestPair(train, test)

Parameters
__len__()#

Return len(self).

count(value, /)#

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

test: ethicml.utility.data_structures.TestTuple#

Alias for field number 1

train: ethicml.utility.data_structures.DataTuple#

Alias for field number 0

aggregate_results(results, metrics, aggregator=('mean', 'std'))#

Aggregate results over the repeats.

Parameters
  • results (ethicml.utility.data_structures.Results) –

  • metrics (List[str]) –

  • aggregator (Union[str, Tuple[str, ...]]) –

Return type

pandas.DataFrame

concat_dt(datatup_list, axis='index', ignore_index=False)#

Concatenate the data tuples in the given list.

Parameters
Return type

ethicml.utility.data_structures.DataTuple

concat_tt(datatup_list, axis='index', ignore_index=False)#

Concatenate the test tuples in the given list.

Parameters
Return type

ethicml.utility.data_structures.TestTuple

filter_and_map_results(results, mapping)#

Filter entries and change the index with a mapping.

Parameters
  • results (ethicml.utility.data_structures.Results) –

  • mapping (Mapping[str, str]) –

Return type

ethicml.utility.data_structures.Results

filter_results(results, values, index='model')#

Filter the entries based on the given values.

Parameters
  • results (ethicml.utility.data_structures.Results) –

  • values (Iterable) –

  • index (Literal['dataset', 'scaler', 'transform', 'model']) –

Return type

ethicml.utility.data_structures.Results

make_results(data_frame=None)#

Initialise Results object.

You should always use this function instead of using the “constructor” directly, because this function checks whether the columns are correct.

Parameters

data_frame (Union[None, pd.DataFrame, Path]) –

Return type

Results

map_over_results_index(results, mapper)#

Change the values of the index with a transformation function.

Parameters
  • results (ethicml.utility.data_structures.Results) –

  • mapper (Callable[[Tuple[str, str, str, str, str]], Tuple[str, str, str, str, str]]) –

Return type

ethicml.utility.data_structures.Results

shuffle_df(df, random_state)#

Shuffle a given dataframe.

Parameters
  • df (pandas.DataFrame) –

  • random_state (int) –

Return type

pandas.DataFrame

undo_one_hot(df, new_column_name=None)#

Undo one-hot encoding.

Parameters
  • df (pandas.DataFrame) –

  • new_column_name (Optional[str]) –

Return type

Union[pandas.Series, pandas.DataFrame]