Preprocess algorithms#

Pre-process algorithms take the training data and transform it.

Preprocess base#

Abstract Base Class of all algorithms in the framework.

class PreAlgorithm#

Bases: Algorithm, ABC

Abstract Base Class for all algorithms that do pre-processing.

abstract fit(train, seed=888)#

Fit transformer on the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

abstract property name: str#: Name of the algorithm.

abstract property out_size: int#: Return the number of features to generate.

abstract run(train, test, seed=888)#

Generate fair features with the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

abstract transform(data)#

Generate fair features with the given data.

Parameters:: data (T) – Data to transform.
Returns:: Transformed data.
Return type:: T

Beutel#

Beutel’s algorithm.

class Beutel(dir=PosixPath('.'), fairness=FairnessType.dp, enc_size=<factory>, adv_size=<factory>, pred_size=<factory>, enc_activation='Sigmoid()', adv_activation='Sigmoid()', batch_size=64, y_loss='CrossEntropyLoss()', s_loss='BCELoss()', epochs=50, adv_weight=1.0, validation_pcnt=0.1)#

Bases: PreAlgorithmSubprocess

Beutel’s adversarially learned fair representations.

Parameters:

dir (Path) –
fairness (FairnessType) –
enc_size (List[int]) –
adv_size (List[int]) –
pred_size (List[int]) –
enc_activation (str) –
adv_activation (str) –
batch_size (int) –
y_loss (str) –
s_loss (str) –
epochs (int) –
adv_weight (float) –
validation_pcnt (float) –

call_script(cmd_args, env=None, cwd=None)#

Call a (Python) script as a separate process.

Parameters:

cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g. {"PATH": "/usr/bin"}.
cwd (Path | None) – If not None, change working directory to the given path before running command.

Raises:

RuntimeError – If the called script failed or timed out.

Return type:

None

property executable: list[str]#

Path to a (Python) executable.

By default, the Python executable that called this script is used.

fit(train, seed=888)#

Fit transformer in a subprocess on the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property model_path: Path#: Path to where the model with be stored.

property name: str#: Name of the algorithm.

property out_size: int#: Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features in a subprocess with the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features in a subprocess with the given data.

Parameters:: data (T) – Data to transform.
Returns:: Transformed data.
Return type:: T

Calders#

Kamiran&Calders 2012, massaging.

class Calders(preferable_class=1, disadvantaged_group=0)#

Bases: PreAlgorithm

Massaging algorithm from Kamiran&Calders 2012.

Parameters:

preferable_class (int) –
disadvantaged_group (int) –

fit(train, seed=888)#

Fit transformer on the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property name: str#: Name of the algorithm.

property out_size: int#: Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features with the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features with the given data.

Parameters:: data (T) – Data to transform.
Returns:: Transformed data.
Return type:: T

Upsampling#

Simple upsampler that makes subgroups the same size as the majority group.

class UpsampleStrategy(value)#

Bases: StrEnum

Strategy for upsampling.

class Upsampler(strategy=UpsampleStrategy.uniform)#

Bases: PreAlgorithm

Upsampler algorithm.

Given a datatuple, create a larger datatuple such that the subgroups have a balanced number of samples.

Parameters:: strategy (UpsampleStrategy) –

fit(train, seed=888)#

Fit transformer on the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property name: str#: Name of the algorithm.

property out_size: int#: Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features with the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features with the given data.

Parameters:: data (T) – Data to transform.
Returns:: Transformed data.
Return type:: T

VFAE#

Variational Fair Auto-Encoder by Louizos et al.

class VFAE(dir=PosixPath('.'), dataset='toy', supervised=True, epochs=10, batch_size=32, fairness=FairnessType.dp, latent_dims=50, z1_enc_size=<factory>, z2_enc_size=<factory>, z1_dec_size=<factory>)#

Bases: PreAlgorithmSubprocess

VFAE Object - see implementation file for details.

Parameters:

dir (Path) –
dataset (str) –
supervised (bool) –
epochs (int) –
batch_size (int) –
fairness (FairnessType) –
latent_dims (int) –
z1_enc_size (List[int]) –
z2_enc_size (List[int]) –
z1_dec_size (List[int]) –

call_script(cmd_args, env=None, cwd=None)#

Call a (Python) script as a separate process.

Parameters:

cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g. {"PATH": "/usr/bin"}.
cwd (Path | None) – If not None, change working directory to the given path before running command.

Raises:

RuntimeError – If the called script failed or timed out.

Return type:

None

property executable: list[str]#

Path to a (Python) executable.

By default, the Python executable that called this script is used.

fit(train, seed=888)#

Fit transformer in a subprocess on the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property model_path: Path#: Path to where the model with be stored.

property name: str#: Name of the algorithm.

property out_size: int#: Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features in a subprocess with the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features in a subprocess with the given data.

Parameters:: data (T) – Data to transform.
Returns:: Transformed data.
Return type:: T

Zemel#

Zemel’s Learned Fair Representations.

class Zemel(dir=PosixPath('.'), threshold=0.5, clusters=2, Ax=0.01, Ay=0.1, Az=0.5, max_iter=5000, maxfun=5000, epsilon=1e-05)#

Bases: PreAlgorithmSubprocess

AIF360 implementation of Zemel’s LFR.

Parameters:

dir (Path) –
threshold (float) –
clusters (int) –
Ax (float) –
Ay (float) –
Az (float) –
max_iter (int) –
maxfun (int) –
epsilon (float) –

call_script(cmd_args, env=None, cwd=None)#

Call a (Python) script as a separate process.

Parameters:

cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g. {"PATH": "/usr/bin"}.
cwd (Path | None) – If not None, change working directory to the given path before running command.

Raises:

RuntimeError – If the called script failed or timed out.

Return type:

None

property executable: list[str]#

Path to a (Python) executable.

By default, the Python executable that called this script is used.

fit(train, seed=888)#

Fit transformer in a subprocess on the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property model_path: Path#: Path to where the model with be stored.

property name: str#: Name of the algorithm.

property out_size: int#: Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features in a subprocess with the given data.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:

train (DataTuple) – Data tuple of the training data.
test (T) – Data tuple of the test data.
seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features in a subprocess with the given data.

Parameters:: data (T) – Data to transform.
Returns:: Transformed data.
Return type:: T