Utils#
Module for kind of useful things that don’t really belong anywhere else (just yet).
Classes:
Base class for decision functions. |
|
Classifier type. |
|
A tuple of dataframes for the features, the sensitive attribute and the class labels. |
|
Fairness type. |
|
Decision function that accepts predictions with score of 50% or above. |
|
Values for SVM Kernel. |
|
A tuple of dataframes for the features, the sensitive attribute and the class labels. |
|
What to use as the underlying model for the fairness method. |
|
Prediction of an algorithm. |
|
Aggregate results. |
|
Prediction of an algorithm that makes soft predictions. |
|
A tuple of dataframes for the features and the sensitive attribute. |
|
2-Tuple of train and test data. |
|
2-Tuple of train and test data. |
Functions:
Aggregate results over the repeats. |
|
Concatenate the data tuples in the given list. |
|
Filter entries and change the index with a mapping. |
|
Filter the entries based on the given values. |
|
Initialise Results object. |
|
Change the values of the index with a transformation function. |
|
Shuffle a given dataframe. |
|
Undo one-hot encoding. |
- class Activation#
Bases:
ABC
Base class for decision functions.
- abstract apply(soft_output)#
Apply the decision function to a soft prediction.
- Parameters:
soft_output (ndarray) – soft prediction (i.e. a probability or logits)
- Returns:
decision
- Return type:
ndarray
- abstract get_name()#
Name of activation function.
- Return type:
str
- class ClassifierType(value)#
Bases:
StrEnum
Classifier type.
- gbt = 'gbt'#
Gradient Boosting.
- lr = 'lr'#
Logistic Regression.
- svm = 'svm'#
Support Vector Machine.
- class DataTuple(data, s_column, y_column, s_in_x, name)#
Bases:
SubsetMixin
A tuple of dataframes for the features, the sensitive attribute and the class labels.
- Parameters:
data (DataFrame) –
s_column (str) –
y_column (str) –
s_in_x (bool) –
name (str | None) –
- __len__()#
Number of entries in the underlying data.
- Return type:
int
- apply_to_joined_df(mapper)#
Concatenate the dataframes in the DataTuple and then apply a function to it.
- Parameters:
mapper (Callable[[DataFrame], DataFrame]) – A function that takes a dataframe and returns a dataframe.
- Returns:
The transformed
DataTuple
.- Return type:
Self
- classmethod from_df(*, x, s, y, name=None)#
Make a DataTuple.
- Parameters:
x (DataFrame) –
s (pd.Series[int]) –
y (pd.Series[int]) –
name (str | None) –
- Return type:
Self
- classmethod from_file(data_path)#
Load data tuple from npz file.
- Parameters:
data_path (Path) – Path to the npz file.
- Returns:
A
DataTuple
with the loaded data.- Return type:
Self
- get_n_samples(num=500)#
Get the first elements of the dataset.
- Parameters:
num (int) – How many samples to take for subset. (Default: 500)
- Returns:
Subset of training data.
- Return type:
Self
- get_s_subset(s)#
Return a subset of the DataTuple where S=s.
- Parameters:
s (int) –
- Return type:
Self
- remove_y()#
Convert the DataTuple instance to a SubgroupTuple instance.
- Return type:
- rename(name)#
Change only the name.
- Parameters:
name (str) –
- Return type:
Self
- replace(*, x=None, s=None, y=None)#
Create a copy of the DataTuple but change the given values.
- Parameters:
x (DataFrame | None) –
s (Series | None) –
y (Series | None) –
- Return type:
Self
- replace_data(data, name=None)#
Make a copy of the DataTuple but change the underlying data.
- Parameters:
data (DataFrame) –
name (str | None) –
- Return type:
Self
- property s: pd.Series[int]#
Getter for property s.
- save_to_file(data_path)#
Save DataTuple as an npz file.
- Parameters:
data_path (Path) – Path to the npz file.
- Return type:
None
- property x: DataFrame#
Getter for property x.
- property y: pd.Series[int]#
Getter for property y.
- class FairnessType(value)#
Bases:
StrEnum
Fairness type.
- dp = 'dp'#
Demographic parity.
- eq_odds = 'eq_odds'#
Equalized Odds.
- eq_opp = 'eq_opp'#
Equality of Opportunity.
- class Heaviside#
Bases:
Activation
Decision function that accepts predictions with score of 50% or above.
- apply(soft_output)#
Apply the decision function to each element of an ndarray.
- Parameters:
soft_output (ndarray) – Soft predictions.
- Returns:
Hard predictions.
- Return type:
ndarray
- get_name()#
Getter for name of decision function.
- Return type:
str
- class KernelType(value)#
Bases:
StrEnum
Values for SVM Kernel.
- linear = 'linear'#
Linear kernel.
- poly = 'poly'#
Polynomial kernel.
- rbf = 'rbf'#
Radial basis function kernel.
- sigmoid = 'sigmoid'#
Sigmoid kernel.
- class LabelTuple(data, s_column, y_column, name)#
Bases:
SubsetMixin
A tuple of dataframes for the features, the sensitive attribute and the class labels.
- Parameters:
data (DataFrame) –
s_column (str) –
y_column (str) –
name (str | None) –
- __len__()#
Number of entries in the underlying data.
- Return type:
int
- classmethod from_df(*, s, y, name=None)#
Make a LabelTuple.
- Parameters:
s (pd.Series[int]) –
y (pd.Series[int]) –
name (str | None) –
- Return type:
Self
- classmethod from_np(*, s, y, s_name='s', y_name='y')#
Create a LabelTuple from numpy arrays.
- Parameters:
s (ndarray[Any, dtype[_ScalarType_co]]) –
y (ndarray[Any, dtype[_ScalarType_co]]) –
s_name (str) –
y_name (str) –
- Return type:
Self
- get_n_samples(num=500)#
Get the first elements of the dataset.
- Parameters:
num (int) – How many samples to take for subset. (Default: 500)
- Returns:
Subset of training data.
- Return type:
Self
- get_s_subset(s)#
Return a subset of the DataTuple where S=s.
- Parameters:
s (int) –
- Return type:
Self
- rename(name)#
Change only the name.
- Parameters:
name (str) –
- Return type:
Self
- replace(*, s=None, y=None)#
Create a copy of the LabelTuple but change the given values.
- Parameters:
s (Series | None) –
y (Series | None) –
- Return type:
Self
- replace_data(data, name=None)#
Make a copy of the LabelTuple but change the underlying data.
- Parameters:
data (DataFrame) –
name (str | None) –
- Return type:
Self
- property s: pd.Series[int]#
Getter for property s.
- property y: pd.Series[int]#
Getter for property y.
- class ModelType(value)#
Bases:
StrEnum
What to use as the underlying model for the fairness method.
- deep = 'deep'#
Deep neural network.
- linear = 'linear'#
Linear model.
- class Prediction(hard, info=None)#
Bases:
object
Prediction of an algorithm.
- Parameters:
hard (Series) –
info (Dict[str, bool | int | float | str] | None) –
- __len__()#
Length of the predictions object.
- Return type:
int
- static from_file(npz_path)#
Load prediction from npz file.
- Parameters:
npz_path (Path) – Path to the npz file.
- Returns:
A
Prediction
object with the loaded data.- Return type:
- classmethod from_np(preds)#
Construct a prediction object from a numpy array.
- Parameters:
preds (ndarray[Any, dtype[_ScalarType_co]]) –
- Return type:
Self
- get_s_subset(s_data, s)#
Return a subset of the DataTuple where S=s.
- Parameters:
s_data (Series) – Dataframe with the s-values.
s (int) – S-value to get the subset for.
- Returns:
The requested subset as a new
Prediction
object.- Return type:
- property hard: Series#
Hard predictions (e.g. 0 and 1).
- property info: Dict[str, bool | int | float | str]#
Additional info about the prediction.
- save_to_file(npz_path)#
Save prediction as npz file.
- Parameters:
npz_path (Path) – Path to the npz file.
- Return type:
None
- class ResultsAggregator(initial=None)#
Bases:
object
Aggregate results.
- Parameters:
initial (DataFrame | None) –
- append_df(data_frame, *, prepend=False)#
Append (or prepend) a DataFrame to this object.
- Parameters:
data_frame (DataFrame) – DataFrame to append.
prepend (bool) – Whether to prepend or append the dataframe. (Default: False)
- Return type:
None
- append_from_csv(csv_file, *, prepend=False)#
Append results from a CSV file.
- Parameters:
csv_file (Path) – Path to the CSV file.
prepend (bool) – (Default: False)
- Returns:
True
if the file existed and was succesfully loaded;False
otherwise.- Return type:
bool
- save_as_csv(file_path)#
Save to csv.
- Parameters:
file_path (Path) – Path to the CSV file.
- Return type:
None
- class SoftPrediction(soft, info=None)#
Bases:
Prediction
Prediction of an algorithm that makes soft predictions.
- Parameters:
soft (ndarray) –
info (Dict[str, bool | int | float | str] | None) –
- __len__()#
Length of the predictions object.
- Return type:
int
- static from_file(npz_path)#
Load prediction from npz file.
- Parameters:
npz_path (Path) – Path to the npz file.
- Returns:
A
Prediction
object with the loaded data.- Return type:
- classmethod from_np(preds)#
Construct a prediction object from a numpy array.
- Parameters:
preds (ndarray[Any, dtype[_ScalarType_co]]) –
- Return type:
Self
- get_s_subset(s_data, s)#
Return a subset of the DataTuple where S=s.
- Parameters:
s_data (Series) – Dataframe with the s-values.
s (int) – S-value to get the subset for.
- Returns:
The requested subset as a new
Prediction
object.- Return type:
- property hard: Series#
Hard predictions (e.g. 0 and 1).
- property info: Dict[str, bool | int | float | str]#
Additional info about the prediction.
- save_to_file(npz_path)#
Save prediction as npz file.
- Parameters:
npz_path (Path) – Path to the npz file.
- Return type:
None
- property soft: ndarray#
Soft predictions (e.g. 0.2 and 0.8).
- class SubgroupTuple(data, s_column, s_in_x, name)#
Bases:
SubsetMixin
A tuple of dataframes for the features and the sensitive attribute.
- Parameters:
data (DataFrame) –
s_column (str) –
s_in_x (bool) –
name (str | None) –
- __len__()#
Number of entries in the underlying data.
- Return type:
int
- classmethod from_df(*, x, s, name=None)#
Make a SubgroupTuple.
- Parameters:
x (DataFrame) –
s (pd.Series[int]) –
name (str | None) –
- Return type:
Self
- classmethod from_file(data_path)#
Load test tuple from npz file.
- Parameters:
data_path (Path) – Path to load the npz file.
- Returns:
A
SubgroupTuple
with the loaded data.- Return type:
Self
- get_n_samples(num=500)#
Get the first elements of the dataset.
- Parameters:
num (int) – How many samples to take for subset. (Default: 500)
- Returns:
Subset of training data.
- Return type:
Self
- get_s_subset(s)#
Return a subset of the DataTuple where S=s.
- Parameters:
s (int) –
- Return type:
Self
- rename(name)#
Change only the name.
- Parameters:
name (str) –
- Return type:
Self
- replace(*, x=None, s=None)#
Create a copy of the SubgroupTuple but change the given values.
- Parameters:
x (DataFrame | None) –
s (Series | None) –
- Return type:
Self
- replace_data(data, name=None)#
Make a copy of the DataTuple but change the underlying data.
- Parameters:
data (DataFrame) –
name (str | None) –
- Return type:
Self
- property s: pd.Series[int]#
Getter for property s.
- save_to_file(data_path)#
Save SubgroupTuple as an npz file.
- Parameters:
data_path (Path) – Path to save the npz file.
- Return type:
None
- property x: DataFrame#
Getter for property x.
- class TrainTestPair(train, test)#
Bases:
NamedTuple
2-Tuple of train and test data.
- Parameters:
train (DataTuple) –
test (SubgroupTuple) –
- __len__()#
Return len(self).
- count(value, /)#
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)#
Return first index of value.
Raises ValueError if the value is not present.
- test: SubgroupTuple#
Alias for field number 1
- class TrainValPair(train, test)#
Bases:
NamedTuple
2-Tuple of train and test data.
- __len__()#
Return len(self).
- count(value, /)#
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)#
Return first index of value.
Raises ValueError if the value is not present.
- aggregate_results(results, metrics, aggregator=('mean', 'std'))#
Aggregate results over the repeats.
- Parameters:
results (Results) – Results object containing the results to aggregate.
metrics (list[str]) – Metrics used for aggregation.
aggregator (str | tuple[str, ...]) – Aggregator to use. The aggreators are the ones used in pandas. (Default: (“mean”, “std”))
- Returns:
The aggregated results as a
pd.DataFrame
.- Return type:
DataFrame
- concat(datatup_list, *, ignore_index=False)#
Concatenate the data tuples in the given list.
- Parameters:
datatup_list (Sequence[T]) – List of data tuples to concatenate.
ignore_index (bool) – Ignore the index of the dataframes. (Default: False)
- Returns:
The concatenated data tuple.
- Return type:
T
- filter_and_map_results(results, mapping)#
Filter entries and change the index with a mapping.
- filter_results(results, values, index='model')#
Filter the entries based on the given values.
- make_results(data_frame=None)#
Initialise Results object.
You should always use this function instead of using the “constructor” directly, because this function checks whether the columns are correct.
- Parameters:
data_frame (None | DataFrame | Path) – A dataframe to use for initialization. (Default: None)
- Returns:
An initialised
Results
object.- Return type:
- map_over_results_index(results, mapper)#
Change the values of the index with a transformation function.
- shuffle_df(df, random_state)#
Shuffle a given dataframe.
- Parameters:
df (DataFrame) –
random_state (int) –
- Return type:
DataFrame
- undo_one_hot(df, new_column_name=None)#
Undo one-hot encoding.
- Parameters:
df (DataFrame) –
new_column_name (str | None) –
- Return type:
Series | DataFrame
Aliases#
- class Results#
Container for results from
evaluate_models()
.alias of
DataFrame
- TestTuple#
Union of
SubgroupTuple
andDataTuple
.
- EvalTuple#
Union of
LabelTuple
andDataTuple
.
- HyperParamValue#
alias of
Union
[bool
,int
,float
,str
]
- HyperParamType#
alias of
Dict
[str
,Union
[bool
,int
,float
,str
]]