Preprocess algorithms#

Pre-process algorithms take the training data and transform it.

Preprocess base#

Abstract Base Class of all algorithms in the framework.

class PreAlgorithm#

Bases: Algorithm, ABC

Abstract Base Class for all algorithms that do pre-processing.

abstract fit(train, seed=888)#

Fit transformer on the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

abstract property name: str#

Name of the algorithm.

abstract property out_size: int#

Return the number of features to generate.

abstract run(train, test, seed=888)#

Generate fair features with the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

abstract transform(data)#

Generate fair features with the given data.

Parameters:

data (T) – Data to transform.

Returns:

Transformed data.

Return type:

T

Beutel#

Beutel’s algorithm.

class Beutel(dir=PosixPath('.'), fairness=FairnessType.dp, enc_size=<factory>, adv_size=<factory>, pred_size=<factory>, enc_activation='Sigmoid()', adv_activation='Sigmoid()', batch_size=64, y_loss='CrossEntropyLoss()', s_loss='BCELoss()', epochs=50, adv_weight=1.0, validation_pcnt=0.1)#

Bases: PreAlgorithmSubprocess

Beutel’s adversarially learned fair representations.

Parameters:
  • dir (Path) –

  • fairness (FairnessType) –

  • enc_size (List[int]) –

  • adv_size (List[int]) –

  • pred_size (List[int]) –

  • enc_activation (str) –

  • adv_activation (str) –

  • batch_size (int) –

  • y_loss (str) –

  • s_loss (str) –

  • epochs (int) –

  • adv_weight (float) –

  • validation_pcnt (float) –

call_script(cmd_args, env=None, cwd=None)#

Call a (Python) script as a separate process.

Parameters:
  • cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.

  • env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g. {"PATH": "/usr/bin"}.

  • cwd (Path | None) – If not None, change working directory to the given path before running command.

Raises:

RuntimeError – If the called script failed or timed out.

Return type:

None

property executable: list[str]#

Path to a (Python) executable.

By default, the Python executable that called this script is used.

fit(train, seed=888)#

Fit transformer in a subprocess on the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property model_path: Path#

Path to where the model with be stored.

property name: str#

Name of the algorithm.

property out_size: int#

Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features in a subprocess with the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features in a subprocess with the given data.

Parameters:

data (T) – Data to transform.

Returns:

Transformed data.

Return type:

T

Calders#

Kamiran&Calders 2012, massaging.

class Calders(preferable_class=1, disadvantaged_group=0)#

Bases: PreAlgorithm

Massaging algorithm from Kamiran&Calders 2012.

Parameters:
  • preferable_class (int) –

  • disadvantaged_group (int) –

fit(train, seed=888)#

Fit transformer on the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features with the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features with the given data.

Parameters:

data (T) – Data to transform.

Returns:

Transformed data.

Return type:

T

Upsampling#

Simple upsampler that makes subgroups the same size as the majority group.

class UpsampleStrategy(value)#

Bases: StrEnum

Strategy for upsampling.

class Upsampler(strategy=UpsampleStrategy.uniform)#

Bases: PreAlgorithm

Upsampler algorithm.

Given a datatuple, create a larger datatuple such that the subgroups have a balanced number of samples.

Parameters:

strategy (UpsampleStrategy) –

fit(train, seed=888)#

Fit transformer on the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features with the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features with the given data.

Parameters:

data (T) – Data to transform.

Returns:

Transformed data.

Return type:

T

VFAE#

Variational Fair Auto-Encoder by Louizos et al.

class VFAE(dir=PosixPath('.'), dataset='toy', supervised=True, epochs=10, batch_size=32, fairness=FairnessType.dp, latent_dims=50, z1_enc_size=<factory>, z2_enc_size=<factory>, z1_dec_size=<factory>)#

Bases: PreAlgorithmSubprocess

VFAE Object - see implementation file for details.

Parameters:
  • dir (Path) –

  • dataset (str) –

  • supervised (bool) –

  • epochs (int) –

  • batch_size (int) –

  • fairness (FairnessType) –

  • latent_dims (int) –

  • z1_enc_size (List[int]) –

  • z2_enc_size (List[int]) –

  • z1_dec_size (List[int]) –

call_script(cmd_args, env=None, cwd=None)#

Call a (Python) script as a separate process.

Parameters:
  • cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.

  • env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g. {"PATH": "/usr/bin"}.

  • cwd (Path | None) – If not None, change working directory to the given path before running command.

Raises:

RuntimeError – If the called script failed or timed out.

Return type:

None

property executable: list[str]#

Path to a (Python) executable.

By default, the Python executable that called this script is used.

fit(train, seed=888)#

Fit transformer in a subprocess on the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property model_path: Path#

Path to where the model with be stored.

property name: str#

Name of the algorithm.

property out_size: int#

Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features in a subprocess with the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features in a subprocess with the given data.

Parameters:

data (T) – Data to transform.

Returns:

Transformed data.

Return type:

T

Zemel#

Zemel’s Learned Fair Representations.

class Zemel(dir=PosixPath('.'), threshold=0.5, clusters=2, Ax=0.01, Ay=0.1, Az=0.5, max_iter=5000, maxfun=5000, epsilon=1e-05)#

Bases: PreAlgorithmSubprocess

AIF360 implementation of Zemel’s LFR.

Parameters:
  • dir (Path) –

  • threshold (float) –

  • clusters (int) –

  • Ax (float) –

  • Ay (float) –

  • Az (float) –

  • max_iter (int) –

  • maxfun (int) –

  • epsilon (float) –

call_script(cmd_args, env=None, cwd=None)#

Call a (Python) script as a separate process.

Parameters:
  • cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.

  • env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g. {"PATH": "/usr/bin"}.

  • cwd (Path | None) – If not None, change working directory to the given path before running command.

Raises:

RuntimeError – If the called script failed or timed out.

Return type:

None

property executable: list[str]#

Path to a (Python) executable.

By default, the Python executable that called this script is used.

fit(train, seed=888)#

Fit transformer in a subprocess on the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of Self and the test data.

Return type:

tuple[Self, DataTuple]

property model_path: Path#

Path to where the model with be stored.

property name: str#

Name of the algorithm.

property out_size: int#

Return the number of features to generate.

run(train, test, seed=888)#

Generate fair features in a subprocess with the given data.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

run_test(train, test, seed=888)#

Run with reduced training set so that it finishes quicker.

Parameters:
  • train (DataTuple) – Data tuple of the training data.

  • test (T) – Data tuple of the test data.

  • seed (int) – Random seed for model initialization.

Returns:

A tuple of the transforme training data and the test data.

Return type:

tuple[DataTuple, T]

transform(data)#

Generate fair features in a subprocess with the given data.

Parameters:

data (T) – Data to transform.

Returns:

Transformed data.

Return type:

T