Utils#

This module contains kind of useful things that don’t really belong anywhere else (just yet).

Classes:

`Activation`	Base class for decision functions.
`DataTuple`	A tuple of dataframes for the features, the sensitive attribute and the class labels.
`Heaviside`	Decision function that accepts predictions with score of 50% or above.
`Prediction`	Prediction of an algorithm.
`ResultsAggregator`	Aggregate results.
`SoftPrediction`	Prediction of an algorithm that makes soft predictions.
`TestTuple`	A tuple of dataframes for the features and the sensitive attribute.
`TrainTestPair`	2-Tuple of train and test data.

Functions:

`aggregate_results`	Aggregate results over the repeats.
`concat_dt`	Concatenate the data tuples in the given list.
`concat_tt`	Concatenate the test tuples in the given list.
`filter_and_map_results`	Filter entries and change the index with a mapping.
`filter_results`	Filter the entries based on the given values.
`make_results`	Initialise Results object.
`map_over_results_index`	Change the values of the index with a transformation function.
`shuffle_df`	Shuffle a given dataframe.
`undo_one_hot`	Undo one-hot encoding.

class Activation#

Bases: abc.ABC

Base class for decision functions.

abstract apply(soft_output)#

Apply the decision function to a soft prediction.

Parameters: soft_output (numpy.ndarray) – soft prediction (i.e. a probability or logits)
Returns: decision
Return type: numpy.ndarray

abstract get_name()#

Name of activation function.

Return type: str

class DataTuple(x, s, y, name=None)#

Bases: ethicml.utility.data_structures.TestTuple

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Parameters

x (pd.DataFrame) – input features
s (pd.DataFrame) – sensitive attributes
y (pd.DataFrame) – class labels
name (Optional[str]) – optional name of the dataset

Make a DataTuple.

__len__()#

Overwrite __len__ magic method.

Return type: int

apply_to_joined_df(mapper)#: Concatenate the dataframes in the DataTuple and then apply a function to it.

classmethod from_npz(data_path)#

Load data tuple from npz file.

Parameters: data_path (pathlib.Path) –
Return type: ethicml.utility.data_structures.DataTuple

get_subset(num=500)#

Get the first elements of the dataset.

Parameters: num (int) – how many samples to take for subset
Returns: subset of training data
Return type: ethicml.utility.data_structures.DataTuple

property name: Optional[str]#: Getter for name property.

remove_y()#

Convert the DataTuple instance to a TestTuple instance.

Return type: ethicml.utility.data_structures.TestTuple

replace(*, x=None, s=None, name=None, y=None)#

Create a copy of the DataTuple but change the given values.

Parameters

x (Optional[pd.DataFrame]) –
s (Optional[pd.DataFrame]) –
name (Optional[str]) –
y (Optional[pd.DataFrame]) –

Return type

DataTuple

property s: pandas.DataFrame#: Getter for property s.

to_npz(data_path)#

Save DataTuple as an npz file.

Parameters: data_path (pathlib.Path) –
Return type: None

property x: pandas.DataFrame#: Getter for property x.

property y: pandas.DataFrame#: Getter for property y.

class Heaviside#

Bases: ethicml.utility.activation.Activation

Decision function that accepts predictions with score of 50% or above.

apply(soft_output)#

Apply the decision function to each element of an ndarray.

Parameters: soft_output (numpy.ndarray) –
Return type: numpy.ndarray

get_name()#

Getter for name of decision function.

Return type: str

class Prediction(hard, info=None)#

Bases: object

Prediction of an algorithm.

Make a prediction obj.

Parameters

hard (pd.Series) –
info (Optional[Dict[str, float]]) –

__len__()#

Length of the predictions object.

Return type: int

static from_npz(npz_path)#

Load prediction from npz file.

Parameters: npz_path (pathlib.Path) –
Return type: ethicml.utility.data_structures.Prediction

property hard: pd.Series#: Hard predictions (e.g. 0 and 1).

property info: Dict[str, float]#: Additional info about the prediction.

to_npz(npz_path)#

Save prediction as npz file.

Parameters: npz_path (pathlib.Path) –
Return type: None

class ResultsAggregator(initial=None)#

Bases: object

Aggregate results.

Init results aggregator obj.

Parameters: initial (Optional[pd.DataFrame]) –

append_df(data_frame, prepend=False)#

Append (or prepend) a DataFrame to this object.

Parameters

data_frame (pandas.DataFrame) –
prepend (bool) –

Return type

None

append_from_csv(csv_file, prepend=False)#

Append results from a CSV file.

Parameters

csv_file (pathlib.Path) –
prepend (bool) –

Return type

bool

property results: ethicml.utility.data_structures.Results#: Results object over which this class is aggregating.

save_as_csv(file_path)#

Save to csv.

Parameters: file_path (pathlib.Path) –
Return type: None

class SoftPrediction(soft, info=None)#

Bases: ethicml.utility.data_structures.Prediction

Prediction of an algorithm that makes soft predictions.

Make a soft prediction object.

Parameters

soft (pd.Series) –
info (Optional[Dict[str, float]]) –

__len__()#

Length of the predictions object.

Return type: int

static from_npz(npz_path)#

Load prediction from npz file.

Parameters: npz_path (pathlib.Path) –
Return type: ethicml.utility.data_structures.Prediction

property hard: pd.Series#: Hard predictions (e.g. 0 and 1).

property info: Dict[str, float]#: Additional info about the prediction.

property soft: pd.Series#: Soft predictions (e.g. 0.2 and 0.8).

to_npz(npz_path)#

Save prediction as npz file.

Parameters: npz_path (pathlib.Path) –
Return type: None

class TestTuple(x, s, name=None)#

Bases: object

A tuple of dataframes for the features and the sensitive attribute.

Make a TestTuple.

Parameters

x (pd.DataFrame) –
s (pd.DataFrame) –
name (Optional[str]) –

classmethod from_npz(data_path)#

Load test tuple from npz file.

Parameters: data_path (pathlib.Path) –
Return type: ethicml.utility.data_structures.TestTuple

property name: Optional[str]#: Getter for name property.

replace(*, x=None, s=None, name=None)#

Create a copy of the TestTuple but change the given values.

Parameters

x (Optional[pd.DataFrame]) –
s (Optional[pd.DataFrame]) –
name (Optional[str]) –

Return type

TestTuple

property s: pandas.DataFrame#: Getter for property s.

to_npz(data_path)#

Save TestTuple as an npz file.

Parameters: data_path (pathlib.Path) –
Return type: None

property x: pandas.DataFrame#: Getter for property x.

class TrainTestPair(train, test)#

Bases: NamedTuple

2-Tuple of train and test data.

Create new instance of TrainTestPair(train, test)

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

__len__()#: Return len(self).

count(value, /)#: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

test: ethicml.utility.data_structures.TestTuple#: Alias for field number 1

train: ethicml.utility.data_structures.DataTuple#: Alias for field number 0

aggregate_results(results, metrics, aggregator=('mean', 'std'))#

Aggregate results over the repeats.

Parameters

results (ethicml.utility.data_structures.Results) –
metrics (List[str]) –
aggregator (Union[str, Tuple[str, ...]]) –

Return type

pandas.DataFrame

concat_dt(datatup_list, axis='index', ignore_index=False)#

Concatenate the data tuples in the given list.

Parameters

datatup_list (Sequence[ethicml.utility.data_structures.DataTuple]) –
axis (Literal['columns', 'index']) –
ignore_index (bool) –

Return type

ethicml.utility.data_structures.DataTuple

concat_tt(datatup_list, axis='index', ignore_index=False)#

Concatenate the test tuples in the given list.

Parameters

datatup_list (List[ethicml.utility.data_structures.TestTuple]) –
axis (Literal['columns', 'index']) –
ignore_index (bool) –

Return type

ethicml.utility.data_structures.TestTuple

filter_and_map_results(results, mapping)#

Filter entries and change the index with a mapping.

Parameters

results (ethicml.utility.data_structures.Results) –
mapping (Mapping[str, str]) –

Return type

ethicml.utility.data_structures.Results

filter_results(results, values, index='model')#

Filter the entries based on the given values.

Parameters

results (ethicml.utility.data_structures.Results) –
values (Iterable) –
index (Literal['dataset', 'scaler', 'transform', 'model']) –

Return type

ethicml.utility.data_structures.Results

make_results(data_frame=None)#

Initialise Results object.

You should always use this function instead of using the “constructor” directly, because this function checks whether the columns are correct.

Parameters: data_frame (Union[None, pd.DataFrame, Path]) –
Return type: Results

map_over_results_index(results, mapper)#

Change the values of the index with a transformation function.

Parameters

results (ethicml.utility.data_structures.Results) –
mapper (Callable[[Tuple[str, str, str, str, str]], Tuple[str, str, str, str, str]]) –

Return type

ethicml.utility.data_structures.Results

shuffle_df(df, random_state)#

Shuffle a given dataframe.

Parameters

df (pandas.DataFrame) –
random_state (int) –

Return type

pandas.DataFrame

undo_one_hot(df, new_column_name=None)#

Undo one-hot encoding.

Parameters

df (pandas.DataFrame) –
new_column_name (Optional[str]) –

Return type

Union[pandas.Series, pandas.DataFrame]