Utils#

Module for kind of useful things that don’t really belong anywhere else (just yet).

Classes:

`Activation`	Base class for decision functions.
`ClassifierType`	Classifier type.
`DataTuple`	A tuple of dataframes for the features, the sensitive attribute and the class labels.
`FairnessType`	Fairness type.
`Heaviside`	Decision function that accepts predictions with score of 50% or above.
`KernelType`	Values for SVM Kernel.
`LabelTuple`	A tuple of dataframes for the features, the sensitive attribute and the class labels.
`ModelType`	What to use as the underlying model for the fairness method.
`Prediction`	Prediction of an algorithm.
`ResultsAggregator`	Aggregate results.
`SoftPrediction`	Prediction of an algorithm that makes soft predictions.
`SubgroupTuple`	A tuple of dataframes for the features and the sensitive attribute.
`TrainTestPair`	2-Tuple of train and test data.
`TrainValPair`	2-Tuple of train and test data.

Functions:

`aggregate_results`	Aggregate results over the repeats.
`concat`	Concatenate the data tuples in the given list.
`filter_and_map_results`	Filter entries and change the index with a mapping.
`filter_results`	Filter the entries based on the given values.
`make_results`	Initialise Results object.
`map_over_results_index`	Change the values of the index with a transformation function.
`shuffle_df`	Shuffle a given dataframe.
`undo_one_hot`	Undo one-hot encoding.

class Activation#

Bases: ABC

Base class for decision functions.

abstract apply(soft_output)#

Apply the decision function to a soft prediction.

Parameters:: soft_output (ndarray) – soft prediction (i.e. a probability or logits)
Returns:: decision
Return type:: ndarray

abstract get_name()#

Name of activation function.

Return type:: str

class ClassifierType(value)#

Bases: StrEnum

Classifier type.

gbt = 'gbt'#: Gradient Boosting.

lr = 'lr'#: Logistic Regression.

svm = 'svm'#: Support Vector Machine.

class DataTuple(data, s_column, y_column, s_in_x, name)#

Bases: SubsetMixin

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Parameters:

data (DataFrame) –
s_column (str) –
y_column (str) –
s_in_x (bool) –
name (str | None) –

__len__()#

Number of entries in the underlying data.

Return type:: int

apply_to_joined_df(mapper)#

Concatenate the dataframes in the DataTuple and then apply a function to it.

Parameters:: mapper (Callable[[DataFrame], DataFrame]) – A function that takes a dataframe and returns a dataframe.
Returns:: The transformed DataTuple.
Return type:: Self

classmethod from_df(*, x, s, y, name=None)#

Make a DataTuple.

Parameters:

x (DataFrame) –
s (pd.Series[int]) –
y (pd.Series[int]) –
name (str | None) –

Return type:

Self

classmethod from_file(data_path)#

Load data tuple from npz file.

Parameters:: data_path (Path) – Path to the npz file.
Returns:: A DataTuple with the loaded data.
Return type:: Self

get_n_samples(num=500)#

Get the first elements of the dataset.

Parameters:: num (int) – How many samples to take for subset. (Default: 500)
Returns:: Subset of training data.
Return type:: Self

get_s_subset(s)#

Return a subset of the DataTuple where S=s.

Parameters:: s (int) –
Return type:: Self

remove_y()#

Convert the DataTuple instance to a SubgroupTuple instance.

Return type:: SubgroupTuple

rename(name)#

Change only the name.

Parameters:: name (str) –
Return type:: Self

replace(*, x=None, s=None, y=None)#

Create a copy of the DataTuple but change the given values.

Parameters:

x (DataFrame | None) –
s (Series | None) –
y (Series | None) –

Return type:

Self

replace_data(data, name=None)#

Make a copy of the DataTuple but change the underlying data.

Parameters:

data (DataFrame) –
name (str | None) –

Return type:

Self

property s: pd.Series[int]#: Getter for property s.

save_to_file(data_path)#

Save DataTuple as an npz file.

Parameters:: data_path (Path) – Path to the npz file.
Return type:: None

property x: DataFrame#: Getter for property x.

property y: pd.Series[int]#: Getter for property y.

class FairnessType(value)#

Bases: StrEnum

Fairness type.

dp = 'dp'#: Demographic parity.

eq_odds = 'eq_odds'#: Equalized Odds.

eq_opp = 'eq_opp'#: Equality of Opportunity.

class Heaviside#

Bases: Activation

Decision function that accepts predictions with score of 50% or above.

apply(soft_output)#

Apply the decision function to each element of an ndarray.

Parameters:: soft_output (ndarray) – Soft predictions.
Returns:: Hard predictions.
Return type:: ndarray

get_name()#

Getter for name of decision function.

Return type:: str

class KernelType(value)#

Bases: StrEnum

Values for SVM Kernel.

linear = 'linear'#: Linear kernel.

poly = 'poly'#: Polynomial kernel.

rbf = 'rbf'#: Radial basis function kernel.

sigmoid = 'sigmoid'#: Sigmoid kernel.

class LabelTuple(data, s_column, y_column, name)#

Bases: SubsetMixin

A tuple of dataframes for the features, the sensitive attribute and the class labels.

Parameters:

data (DataFrame) –
s_column (str) –
y_column (str) –
name (str | None) –

__len__()#

Number of entries in the underlying data.

Return type:: int

classmethod from_df(*, s, y, name=None)#

Make a LabelTuple.

Parameters:

s (pd.Series[int]) –
y (pd.Series[int]) –
name (str | None) –

Return type:

Self

classmethod from_np(*, s, y, s_name='s', y_name='y')#

Create a LabelTuple from numpy arrays.

Parameters:

s (ndarray[Any, dtype[_ScalarType_co]]) –
y (ndarray[Any, dtype[_ScalarType_co]]) –
s_name (str) –
y_name (str) –

Return type:

Self

get_n_samples(num=500)#

Get the first elements of the dataset.

Parameters:: num (int) – How many samples to take for subset. (Default: 500)
Returns:: Subset of training data.
Return type:: Self

get_s_subset(s)#

Return a subset of the DataTuple where S=s.

Parameters:: s (int) –
Return type:: Self

rename(name)#

Change only the name.

Parameters:: name (str) –
Return type:: Self

replace(*, s=None, y=None)#

Create a copy of the LabelTuple but change the given values.

Parameters:

s (Series | None) –
y (Series | None) –

Return type:

Self

replace_data(data, name=None)#

Make a copy of the LabelTuple but change the underlying data.

Parameters:

data (DataFrame) –
name (str | None) –

Return type:

Self

property s: pd.Series[int]#: Getter for property s.

property y: pd.Series[int]#: Getter for property y.

class ModelType(value)#

Bases: StrEnum

What to use as the underlying model for the fairness method.

deep = 'deep'#: Deep neural network.

linear = 'linear'#: Linear model.

class Prediction(hard, info=None)#

Bases: object

Prediction of an algorithm.

Parameters:

hard (Series) –
info (Dict[str, bool | int | float | str] | None) –

__len__()#

Length of the predictions object.

Return type:: int

static from_file(npz_path)#

Load prediction from npz file.

Parameters:: npz_path (Path) – Path to the npz file.
Returns:: A Prediction object with the loaded data.
Return type:: Prediction

classmethod from_np(preds)#

Construct a prediction object from a numpy array.

Parameters:: preds (ndarray[Any, dtype[_ScalarType_co]]) –
Return type:: Self

get_s_subset(s_data, s)#

Return a subset of the DataTuple where S=s.

Parameters:

s_data (Series) – Dataframe with the s-values.
s (int) – S-value to get the subset for.

Returns:

The requested subset as a new Prediction object.

Return type:

Prediction

property hard: Series#: Hard predictions (e.g. 0 and 1).

property info: Dict[str, bool | int | float | str]#: Additional info about the prediction.

save_to_file(npz_path)#

Save prediction as npz file.

Parameters:: npz_path (Path) – Path to the npz file.
Return type:: None

class ResultsAggregator(initial=None)#

Bases: object

Aggregate results.

Parameters:: initial (DataFrame | None) –

append_df(data_frame, *, prepend=False)#

Append (or prepend) a DataFrame to this object.

Parameters:

data_frame (DataFrame) – DataFrame to append.
prepend (bool) – Whether to prepend or append the dataframe. (Default: False)

Return type:

None

append_from_csv(csv_file, *, prepend=False)#

Append results from a CSV file.

Parameters:

csv_file (Path) – Path to the CSV file.
prepend (bool) – (Default: False)

Returns:

True if the file existed and was succesfully loaded; False otherwise.

Return type:

bool

property results: Results#: Results object over which this class is aggregating.

save_as_csv(file_path)#

Save to csv.

Parameters:: file_path (Path) – Path to the CSV file.
Return type:: None

class SoftPrediction(soft, info=None)#

Bases: Prediction

Prediction of an algorithm that makes soft predictions.

Parameters:

soft (ndarray) –
info (Dict[str, bool | int | float | str] | None) –

__len__()#

Length of the predictions object.

Return type:: int

static from_file(npz_path)#

Load prediction from npz file.

Parameters:: npz_path (Path) – Path to the npz file.
Returns:: A Prediction object with the loaded data.
Return type:: Prediction

classmethod from_np(preds)#

Construct a prediction object from a numpy array.

Parameters:: preds (ndarray[Any, dtype[_ScalarType_co]]) –
Return type:: Self

get_s_subset(s_data, s)#

Return a subset of the DataTuple where S=s.

Parameters:

s_data (Series) – Dataframe with the s-values.
s (int) – S-value to get the subset for.

Returns:

The requested subset as a new Prediction object.

Return type:

Prediction

property hard: Series#: Hard predictions (e.g. 0 and 1).

property info: Dict[str, bool | int | float | str]#: Additional info about the prediction.

save_to_file(npz_path)#

Save prediction as npz file.

Parameters:: npz_path (Path) – Path to the npz file.
Return type:: None

property soft: ndarray#: Soft predictions (e.g. 0.2 and 0.8).

class SubgroupTuple(data, s_column, s_in_x, name)#

Bases: SubsetMixin

A tuple of dataframes for the features and the sensitive attribute.

Parameters:

data (DataFrame) –
s_column (str) –
s_in_x (bool) –
name (str | None) –

__len__()#

Number of entries in the underlying data.

Return type:: int

classmethod from_df(*, x, s, name=None)#

Make a SubgroupTuple.

Parameters:

x (DataFrame) –
s (pd.Series[int]) –
name (str | None) –

Return type:

Self

classmethod from_file(data_path)#

Load test tuple from npz file.

Parameters:: data_path (Path) – Path to load the npz file.
Returns:: A SubgroupTuple with the loaded data.
Return type:: Self

get_n_samples(num=500)#

Get the first elements of the dataset.

Parameters:: num (int) – How many samples to take for subset. (Default: 500)
Returns:: Subset of training data.
Return type:: Self

get_s_subset(s)#

Return a subset of the DataTuple where S=s.

Parameters:: s (int) –
Return type:: Self

rename(name)#

Change only the name.

Parameters:: name (str) –
Return type:: Self

replace(*, x=None, s=None)#

Create a copy of the SubgroupTuple but change the given values.

Parameters:

x (DataFrame | None) –
s (Series | None) –

Return type:

Self

replace_data(data, name=None)#

Make a copy of the DataTuple but change the underlying data.

Parameters:

data (DataFrame) –
name (str | None) –

Return type:

Self

property s: pd.Series[int]#: Getter for property s.

save_to_file(data_path)#

Save SubgroupTuple as an npz file.

Parameters:: data_path (Path) – Path to save the npz file.
Return type:: None

property x: DataFrame#: Getter for property x.

class TrainTestPair(train, test)#

Bases: NamedTuple

2-Tuple of train and test data.

Parameters:

train (DataTuple) –
test (SubgroupTuple) –

__len__()#: Return len(self).

count(value, /)#: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

test: SubgroupTuple#: Alias for field number 1

train: DataTuple#: Alias for field number 0

class TrainValPair(train, test)#

Bases: NamedTuple

2-Tuple of train and test data.

Parameters:

train (DataTuple) –
test (DataTuple) –

__len__()#: Return len(self).

count(value, /)#: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

test: DataTuple#: Alias for field number 1

train: DataTuple#: Alias for field number 0

aggregate_results(results, metrics, aggregator=('mean', 'std'))#

Aggregate results over the repeats.

Parameters:

results (Results) – Results object containing the results to aggregate.
metrics (list[str]) – Metrics used for aggregation.
aggregator (str | tuple[str, ...]) – Aggregator to use. The aggreators are the ones used in pandas. (Default: (“mean”, “std”))

Returns:

The aggregated results as a pd.DataFrame.

Return type:

DataFrame

concat(datatup_list, *, ignore_index=False)#

Concatenate the data tuples in the given list.

Parameters:

datatup_list (Sequence[T]) – List of data tuples to concatenate.
ignore_index (bool) – Ignore the index of the dataframes. (Default: False)

Returns:

The concatenated data tuple.

Return type:

filter_and_map_results(results, mapping)#

Filter entries and change the index with a mapping.

Parameters:

results (Results) – Results object to filter.
mapping (Mapping[str, str]) – Mapping from old index to new index.

Returns:

The filtered and mapped results.

Return type:

Results

filter_results(results, values, index='model')#

Filter the entries based on the given values.

Parameters:

results (Results) – Results object to filter.
values (Iterable) – Values to filter on.
index (str | PandasIndex) – Index to filter on. (Default: “model”)

Returns:

The filtered results.

Return type:

Results

make_results(data_frame=None)#

Initialise Results object.

You should always use this function instead of using the “constructor” directly, because this function checks whether the columns are correct.

Parameters:: data_frame (None | DataFrame | Path) – A dataframe to use for initialization. (Default: None)
Returns:: An initialised Results object.
Return type:: Results

map_over_results_index(results, mapper)#

Change the values of the index with a transformation function.

Parameters:

results (Results) –
mapper (Callable[[tuple[str, str, str, str, str]], tuple[str, str, str, str, str]]) –

Return type:

Results

shuffle_df(df, random_state)#

Shuffle a given dataframe.

Parameters:

df (DataFrame) –
random_state (int) –

Return type:

DataFrame

undo_one_hot(df, new_column_name=None)#

Undo one-hot encoding.

Parameters:

df (DataFrame) –
new_column_name (str | None) –

Return type:

Series | DataFrame

Aliases#

class Results#

Container for results from evaluate_models().

alias of DataFrame

TestTuple#: Union of SubgroupTuple and DataTuple.

EvalTuple#: Union of LabelTuple and DataTuple.

HyperParamValue#: alias of Union[bool, int, float, str]

HyperParamType#: alias of Dict[str, Union[bool, int, float, str]]