Preprocess algorithms#
Pre-process algorithms take the training data and transform it.
Preprocess base#
Abstract Base Class of all algorithms in the framework.
- class PreAlgorithm#
Bases:
Algorithm
,ABC
Abstract Base Class for all algorithms that do pre-processing.
- abstract fit(train, seed=888)#
Fit transformer on the given data.
- abstract property name: str#
Name of the algorithm.
- abstract property out_size: int#
Return the number of features to generate.
- abstract run(train, test, seed=888)#
Generate fair features with the given data.
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- abstract transform(data)#
Generate fair features with the given data.
- Parameters:
data (T) – Data to transform.
- Returns:
Transformed data.
- Return type:
T
Beutel#
Beutel’s algorithm.
- class Beutel(dir=PosixPath('.'), fairness=FairnessType.dp, enc_size=<factory>, adv_size=<factory>, pred_size=<factory>, enc_activation='Sigmoid()', adv_activation='Sigmoid()', batch_size=64, y_loss='CrossEntropyLoss()', s_loss='BCELoss()', epochs=50, adv_weight=1.0, validation_pcnt=0.1)#
Bases:
PreAlgorithmSubprocess
Beutel’s adversarially learned fair representations.
- Parameters:
dir (Path) –
fairness (FairnessType) –
enc_size (List[int]) –
adv_size (List[int]) –
pred_size (List[int]) –
enc_activation (str) –
adv_activation (str) –
batch_size (int) –
y_loss (str) –
s_loss (str) –
epochs (int) –
adv_weight (float) –
validation_pcnt (float) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Path to a (Python) executable.
By default, the Python executable that called this script is used.
- fit(train, seed=888)#
Fit transformer in a subprocess on the given data.
- property model_path: Path#
Path to where the model with be stored.
- property name: str#
Name of the algorithm.
- property out_size: int#
Return the number of features to generate.
- run(train, test, seed=888)#
Generate fair features in a subprocess with the given data.
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- transform(data)#
Generate fair features in a subprocess with the given data.
- Parameters:
data (T) – Data to transform.
- Returns:
Transformed data.
- Return type:
T
Calders#
Kamiran&Calders 2012, massaging.
- class Calders(preferable_class=1, disadvantaged_group=0)#
Bases:
PreAlgorithm
Massaging algorithm from Kamiran&Calders 2012.
- Parameters:
preferable_class (int) –
disadvantaged_group (int) –
- fit(train, seed=888)#
Fit transformer on the given data.
- property name: str#
Name of the algorithm.
- property out_size: int#
Return the number of features to generate.
- run(train, test, seed=888)#
Generate fair features with the given data.
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- transform(data)#
Generate fair features with the given data.
- Parameters:
data (T) – Data to transform.
- Returns:
Transformed data.
- Return type:
T
Upsampling#
Simple upsampler that makes subgroups the same size as the majority group.
- class UpsampleStrategy(value)#
Bases:
StrEnum
Strategy for upsampling.
- class Upsampler(strategy=UpsampleStrategy.uniform)#
Bases:
PreAlgorithm
Upsampler algorithm.
Given a datatuple, create a larger datatuple such that the subgroups have a balanced number of samples.
- Parameters:
strategy (UpsampleStrategy) –
- fit(train, seed=888)#
Fit transformer on the given data.
- property name: str#
Name of the algorithm.
- property out_size: int#
Return the number of features to generate.
- run(train, test, seed=888)#
Generate fair features with the given data.
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- transform(data)#
Generate fair features with the given data.
- Parameters:
data (T) – Data to transform.
- Returns:
Transformed data.
- Return type:
T
VFAE#
Variational Fair Auto-Encoder by Louizos et al.
- class VFAE(dir=PosixPath('.'), dataset='toy', supervised=True, epochs=10, batch_size=32, fairness=FairnessType.dp, latent_dims=50, z1_enc_size=<factory>, z2_enc_size=<factory>, z1_dec_size=<factory>)#
Bases:
PreAlgorithmSubprocess
VFAE Object - see implementation file for details.
- Parameters:
dir (Path) –
dataset (str) –
supervised (bool) –
epochs (int) –
batch_size (int) –
fairness (FairnessType) –
latent_dims (int) –
z1_enc_size (List[int]) –
z2_enc_size (List[int]) –
z1_dec_size (List[int]) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Path to a (Python) executable.
By default, the Python executable that called this script is used.
- fit(train, seed=888)#
Fit transformer in a subprocess on the given data.
- property model_path: Path#
Path to where the model with be stored.
- property name: str#
Name of the algorithm.
- property out_size: int#
Return the number of features to generate.
- run(train, test, seed=888)#
Generate fair features in a subprocess with the given data.
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- transform(data)#
Generate fair features in a subprocess with the given data.
- Parameters:
data (T) – Data to transform.
- Returns:
Transformed data.
- Return type:
T
Zemel#
Zemel’s Learned Fair Representations.
- class Zemel(dir=PosixPath('.'), threshold=0.5, clusters=2, Ax=0.01, Ay=0.1, Az=0.5, max_iter=5000, maxfun=5000, epsilon=1e-05)#
Bases:
PreAlgorithmSubprocess
AIF360 implementation of Zemel’s LFR.
- Parameters:
dir (Path) –
threshold (float) –
clusters (int) –
Ax (float) –
Ay (float) –
Az (float) –
max_iter (int) –
maxfun (int) –
epsilon (float) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Path to a (Python) executable.
By default, the Python executable that called this script is used.
- fit(train, seed=888)#
Fit transformer in a subprocess on the given data.
- property model_path: Path#
Path to where the model with be stored.
- property name: str#
Name of the algorithm.
- property out_size: int#
Return the number of features to generate.
- run(train, test, seed=888)#
Generate fair features in a subprocess with the given data.
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- transform(data)#
Generate fair features in a subprocess with the given data.
- Parameters:
data (T) – Data to transform.
- Returns:
Transformed data.
- Return type:
T