Utils#

Module for kind of useful things that don’t really belong anywhere else (just yet).

Classes:

Activation

Base class for decision functions.

ClassifierType

Classifier type.

DataTuple

A tuple of dataframes for the features, the sensitive attribute and the class labels.

FairnessType

Fairness type.

Heaviside

Decision function that accepts predictions with score of 50% or above.

KernelType

Values for SVM Kernel.

LabelTuple

A tuple of dataframes for the features, the sensitive attribute and the class labels.

ModelType

What to use as the underlying model for the fairness method.

Prediction

Prediction of an algorithm.

ResultsAggregator

Aggregate results.

SoftPrediction

Prediction of an algorithm that makes soft predictions.

SubgroupTuple

A tuple of dataframes for the features and the sensitive attribute.

TrainTestPair

2-Tuple of train and test data.

TrainValPair

2-Tuple of train and test data.

Functions:

aggregate_results

Aggregate results over the repeats.

concat

Concatenate the data tuples in the given list.

filter_and_map_results

Filter entries and change the index with a mapping.

filter_results

Filter the entries based on the given values.

make_results

Initialise Results object.

map_over_results_index

Change the values of the index with a transformation function.

shuffle_df

Shuffle a given dataframe.

undo_one_hot

Undo one-hot encoding.

class Activation#

Bases: ABC

Base class for decision functions.

abstract apply(soft_output)#

Apply the decision function to a soft prediction.

Parameters:

soft_output (ndarray) – soft prediction (i.e. a probability or logits)

Returns:

decision

Return type:

ndarray

abstract get_name()#

Name of activation function.

Return type:

str

class ClassifierType(value)#

Bases: StrEnum

Classifier type.

gbt = 'gbt'#

Gradient Boosting.

lr = 'lr'#

Logistic Regression.

svm = 'svm'#

Support Vector Machine.

class DataTuple(data, s_column, y_column, s_in_x, name)#

Bases: SubsetMixin

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Parameters:
  • data (DataFrame) –

  • s_column (str) –

  • y_column (str) –

  • s_in_x (bool) –

  • name (str | None) –

__len__()#

Number of entries in the underlying data.

Return type:

int

apply_to_joined_df(mapper)#

Concatenate the dataframes in the DataTuple and then apply a function to it.

Parameters:

mapper (Callable[[DataFrame], DataFrame]) – A function that takes a dataframe and returns a dataframe.

Returns:

The transformed DataTuple.

Return type:

Self

classmethod from_df(*, x, s, y, name=None)#

Make a DataTuple.

Parameters:
  • x (DataFrame) –

  • s (pd.Series[int]) –

  • y (pd.Series[int]) –

  • name (str | None) –

Return type:

Self

classmethod from_file(data_path)#

Load data tuple from npz file.

Parameters:

data_path (Path) – Path to the npz file.

Returns:

A DataTuple with the loaded data.

Return type:

Self

get_n_samples(num=500)#

Get the first elements of the dataset.

Parameters:

num (int) – How many samples to take for subset. (Default: 500)

Returns:

Subset of training data.

Return type:

Self

get_s_subset(s)#

Return a subset of the DataTuple where S=s.

Parameters:

s (int) –

Return type:

Self

remove_y()#

Convert the DataTuple instance to a SubgroupTuple instance.

Return type:

SubgroupTuple

rename(name)#

Change only the name.

Parameters:

name (str) –

Return type:

Self

replace(*, x=None, s=None, y=None)#

Create a copy of the DataTuple but change the given values.

Parameters:
  • x (DataFrame | None) –

  • s (Series | None) –

  • y (Series | None) –

Return type:

Self

replace_data(data, name=None)#

Make a copy of the DataTuple but change the underlying data.

Parameters:
  • data (DataFrame) –

  • name (str | None) –

Return type:

Self

property s: pd.Series[int]#

Getter for property s.

save_to_file(data_path)#

Save DataTuple as an npz file.

Parameters:

data_path (Path) – Path to the npz file.

Return type:

None

property x: DataFrame#

Getter for property x.

property y: pd.Series[int]#

Getter for property y.

class FairnessType(value)#

Bases: StrEnum

Fairness type.

dp = 'dp'#

Demographic parity.

eq_odds = 'eq_odds'#

Equalized Odds.

eq_opp = 'eq_opp'#

Equality of Opportunity.

class Heaviside#

Bases: Activation

Decision function that accepts predictions with score of 50% or above.

apply(soft_output)#

Apply the decision function to each element of an ndarray.

Parameters:

soft_output (ndarray) – Soft predictions.

Returns:

Hard predictions.

Return type:

ndarray

get_name()#

Getter for name of decision function.

Return type:

str

class KernelType(value)#

Bases: StrEnum

Values for SVM Kernel.

linear = 'linear'#

Linear kernel.

poly = 'poly'#

Polynomial kernel.

rbf = 'rbf'#

Radial basis function kernel.

sigmoid = 'sigmoid'#

Sigmoid kernel.

class LabelTuple(data, s_column, y_column, name)#

Bases: SubsetMixin

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Parameters:
  • data (DataFrame) –

  • s_column (str) –

  • y_column (str) –

  • name (str | None) –

__len__()#

Number of entries in the underlying data.

Return type:

int

classmethod from_df(*, s, y, name=None)#

Make a LabelTuple.

Parameters:
  • s (pd.Series[int]) –

  • y (pd.Series[int]) –

  • name (str | None) –

Return type:

Self

classmethod from_np(*, s, y, s_name='s', y_name='y')#

Create a LabelTuple from numpy arrays.

Parameters:
  • s (ndarray[Any, dtype[_ScalarType_co]]) –

  • y (ndarray[Any, dtype[_ScalarType_co]]) –

  • s_name (str) –

  • y_name (str) –

Return type:

Self

get_n_samples(num=500)#

Get the first elements of the dataset.

Parameters:

num (int) – How many samples to take for subset. (Default: 500)

Returns:

Subset of training data.

Return type:

Self

get_s_subset(s)#

Return a subset of the DataTuple where S=s.

Parameters:

s (int) –

Return type:

Self

rename(name)#

Change only the name.

Parameters:

name (str) –

Return type:

Self

replace(*, s=None, y=None)#

Create a copy of the LabelTuple but change the given values.

Parameters:
  • s (Series | None) –

  • y (Series | None) –

Return type:

Self

replace_data(data, name=None)#

Make a copy of the LabelTuple but change the underlying data.

Parameters:
  • data (DataFrame) –

  • name (str | None) –

Return type:

Self

property s: pd.Series[int]#

Getter for property s.

property y: pd.Series[int]#

Getter for property y.

class ModelType(value)#

Bases: StrEnum

What to use as the underlying model for the fairness method.

deep = 'deep'#

Deep neural network.

linear = 'linear'#

Linear model.

class Prediction(hard, info=None)#

Bases: object

Prediction of an algorithm.

Parameters:
  • hard (Series) –

  • info (Dict[str, bool | int | float | str] | None) –

__len__()#

Length of the predictions object.

Return type:

int

static from_file(npz_path)#

Load prediction from npz file.

Parameters:

npz_path (Path) – Path to the npz file.

Returns:

A Prediction object with the loaded data.

Return type:

Prediction

classmethod from_np(preds)#

Construct a prediction object from a numpy array.

Parameters:

preds (ndarray[Any, dtype[_ScalarType_co]]) –

Return type:

Self

get_s_subset(s_data, s)#

Return a subset of the DataTuple where S=s.

Parameters:
  • s_data (Series) – Dataframe with the s-values.

  • s (int) – S-value to get the subset for.

Returns:

The requested subset as a new Prediction object.

Return type:

Prediction

property hard: Series#

Hard predictions (e.g. 0 and 1).

property info: Dict[str, bool | int | float | str]#

Additional info about the prediction.

save_to_file(npz_path)#

Save prediction as npz file.

Parameters:

npz_path (Path) – Path to the npz file.

Return type:

None

class ResultsAggregator(initial=None)#

Bases: object

Aggregate results.

Parameters:

initial (DataFrame | None) –

append_df(data_frame, *, prepend=False)#

Append (or prepend) a DataFrame to this object.

Parameters:
  • data_frame (DataFrame) – DataFrame to append.

  • prepend (bool) – Whether to prepend or append the dataframe. (Default: False)

Return type:

None

append_from_csv(csv_file, *, prepend=False)#

Append results from a CSV file.

Parameters:
  • csv_file (Path) – Path to the CSV file.

  • prepend (bool) – (Default: False)

Returns:

True if the file existed and was succesfully loaded; False otherwise.

Return type:

bool

property results: Results#

Results object over which this class is aggregating.

save_as_csv(file_path)#

Save to csv.

Parameters:

file_path (Path) – Path to the CSV file.

Return type:

None

class SoftPrediction(soft, info=None)#

Bases: Prediction

Prediction of an algorithm that makes soft predictions.

Parameters:
  • soft (ndarray) –

  • info (Dict[str, bool | int | float | str] | None) –

__len__()#

Length of the predictions object.

Return type:

int

static from_file(npz_path)#

Load prediction from npz file.

Parameters:

npz_path (Path) – Path to the npz file.

Returns:

A Prediction object with the loaded data.

Return type:

Prediction

classmethod from_np(preds)#

Construct a prediction object from a numpy array.

Parameters:

preds (ndarray[Any, dtype[_ScalarType_co]]) –

Return type:

Self

get_s_subset(s_data, s)#

Return a subset of the DataTuple where S=s.

Parameters:
  • s_data (Series) – Dataframe with the s-values.

  • s (int) – S-value to get the subset for.

Returns:

The requested subset as a new Prediction object.

Return type:

Prediction

property hard: Series#

Hard predictions (e.g. 0 and 1).

property info: Dict[str, bool | int | float | str]#

Additional info about the prediction.

save_to_file(npz_path)#

Save prediction as npz file.

Parameters:

npz_path (Path) – Path to the npz file.

Return type:

None

property soft: ndarray#

Soft predictions (e.g. 0.2 and 0.8).

class SubgroupTuple(data, s_column, s_in_x, name)#

Bases: SubsetMixin

A tuple of dataframes for the features and the sensitive attribute.

Parameters:
  • data (DataFrame) –

  • s_column (str) –

  • s_in_x (bool) –

  • name (str | None) –

__len__()#

Number of entries in the underlying data.

Return type:

int

classmethod from_df(*, x, s, name=None)#

Make a SubgroupTuple.

Parameters:
  • x (DataFrame) –

  • s (pd.Series[int]) –

  • name (str | None) –

Return type:

Self

classmethod from_file(data_path)#

Load test tuple from npz file.

Parameters:

data_path (Path) – Path to load the npz file.

Returns:

A SubgroupTuple with the loaded data.

Return type:

Self

get_n_samples(num=500)#

Get the first elements of the dataset.

Parameters:

num (int) – How many samples to take for subset. (Default: 500)

Returns:

Subset of training data.

Return type:

Self

get_s_subset(s)#

Return a subset of the DataTuple where S=s.

Parameters:

s (int) –

Return type:

Self

rename(name)#

Change only the name.

Parameters:

name (str) –

Return type:

Self

replace(*, x=None, s=None)#

Create a copy of the SubgroupTuple but change the given values.

Parameters:
  • x (DataFrame | None) –

  • s (Series | None) –

Return type:

Self

replace_data(data, name=None)#

Make a copy of the DataTuple but change the underlying data.

Parameters:
  • data (DataFrame) –

  • name (str | None) –

Return type:

Self

property s: pd.Series[int]#

Getter for property s.

save_to_file(data_path)#

Save SubgroupTuple as an npz file.

Parameters:

data_path (Path) – Path to save the npz file.

Return type:

None

property x: DataFrame#

Getter for property x.

class TrainTestPair(train, test)#

Bases: NamedTuple

2-Tuple of train and test data.

Parameters:
__len__()#

Return len(self).

count(value, /)#

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

test: SubgroupTuple#

Alias for field number 1

train: DataTuple#

Alias for field number 0

class TrainValPair(train, test)#

Bases: NamedTuple

2-Tuple of train and test data.

Parameters:
__len__()#

Return len(self).

count(value, /)#

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

test: DataTuple#

Alias for field number 1

train: DataTuple#

Alias for field number 0

aggregate_results(results, metrics, aggregator=('mean', 'std'))#

Aggregate results over the repeats.

Parameters:
  • results (Results) – Results object containing the results to aggregate.

  • metrics (list[str]) – Metrics used for aggregation.

  • aggregator (str | tuple[str, ...]) – Aggregator to use. The aggreators are the ones used in pandas. (Default: (“mean”, “std”))

Returns:

The aggregated results as a pd.DataFrame.

Return type:

DataFrame

concat(datatup_list, *, ignore_index=False)#

Concatenate the data tuples in the given list.

Parameters:
  • datatup_list (Sequence[T]) – List of data tuples to concatenate.

  • ignore_index (bool) – Ignore the index of the dataframes. (Default: False)

Returns:

The concatenated data tuple.

Return type:

T

filter_and_map_results(results, mapping)#

Filter entries and change the index with a mapping.

Parameters:
  • results (Results) – Results object to filter.

  • mapping (Mapping[str, str]) – Mapping from old index to new index.

Returns:

The filtered and mapped results.

Return type:

Results

filter_results(results, values, index='model')#

Filter the entries based on the given values.

Parameters:
  • results (Results) – Results object to filter.

  • values (Iterable) – Values to filter on.

  • index (str | PandasIndex) – Index to filter on. (Default: “model”)

Returns:

The filtered results.

Return type:

Results

make_results(data_frame=None)#

Initialise Results object.

You should always use this function instead of using the “constructor” directly, because this function checks whether the columns are correct.

Parameters:

data_frame (None | DataFrame | Path) – A dataframe to use for initialization. (Default: None)

Returns:

An initialised Results object.

Return type:

Results

map_over_results_index(results, mapper)#

Change the values of the index with a transformation function.

Parameters:
  • results (Results) –

  • mapper (Callable[[tuple[str, str, str, str, str]], tuple[str, str, str, str, str]]) –

Return type:

Results

shuffle_df(df, random_state)#

Shuffle a given dataframe.

Parameters:
  • df (DataFrame) –

  • random_state (int) –

Return type:

DataFrame

undo_one_hot(df, new_column_name=None)#

Undo one-hot encoding.

Parameters:
  • df (DataFrame) –

  • new_column_name (str | None) –

Return type:

Series | DataFrame

Aliases#

class Results#

Container for results from evaluate_models().

alias of DataFrame

TestTuple#

Union of SubgroupTuple and DataTuple.

EvalTuple#

Union of LabelTuple and DataTuple.

HyperParamValue#

alias of Union[bool, int, float, str]

HyperParamType#

alias of Dict[str, Union[bool, int, float, str]]