Inprocess algorithms#
In-process algorithms take training data and make predictions.
Inprocess base#
Abstract Base Class of all algorithms in the framework.
- class InAlgorithm#
Bases:
Algorithm
,ABC
Abstract Base Class for algorithms that run in the middle of the pipeline.
- abstract fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- abstract property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- abstract property name: str#
Name of the algorithm.
- abstract predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- abstract run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- final run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class InAlgorithmDC#
Bases:
InAlgorithm
,ABC
Base class for algorithms that are dataclasses.
- abstract fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- abstract property name: str#
Name of the algorithm.
- abstract predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- abstract run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class InAlgorithmNoParams#
Bases:
InAlgorithm
,ABC
Base class for algorithms without parameters.
- abstract fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- abstract property name: str#
Name of the algorithm.
- abstract predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- abstract run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Agarwal#
Implementation of Agarwal model.
- class Agarwal(dir=<factory>, fairness=FairnessType.dp, classifier=ClassifierType.lr, eps=0.1, iters=50, C=None, kernel=None)#
Bases:
InAlgorithmSubprocess
Agarwal class.
A wrapper around the Exponentiated Gradient method.
- Parameters:
fairness (FairnessType) – Type of fairness to enforce.
classifier (ClassifierType) – Type of classifier to use.
eps (float) – Epsilon fo.
iters (int) – Number of iterations for the DP algorithm.
C (float | None) – C parameter for the SVM algorithm.
kernel (KernelType | None) – Kernel type for the SVM algorithm.
dir (Path) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Path to a (Python) executable.
By default, the Python executable that called this script is used.
- fit(train, seed=888)#
Fit Algorithm in a subprocess on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property model_path: Path#
Path to where the model with be stored.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions in a subprocess on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm in a subprocess on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Distributionally-robust optimization#
Fairness without Demographics.
- class DRO(dir=<factory>, eta=0.5, epochs=10, batch_size=32, network_size=<factory>)#
Bases:
InAlgorithmSubprocess
Implementation of https://arxiv.org/abs/1806.08010 .
- Parameters:
eta (float) – Tolerance.
epochs (int) – The number of epochs to train for.
batch_size (int) – The batch size.
network_size (List[int]) – The size of the network.
dir (Path) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Path to a (Python) executable.
By default, the Python executable that called this script is used.
- fit(train, seed=888)#
Fit Algorithm in a subprocess on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property model_path: Path#
Path to where the model with be stored.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions in a subprocess on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm in a subprocess on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Base for installed model#
Installable model.
This is a kind of complicated model, but it’s incredibly useful. Say you find a paper from a few years ago with code. It’s not unreasonable that there might be dependency clashes, python clashes, clashes galore. This approach downloads a model, runs it in its own venv and makes everyone happy.
- class InstalledModel(name, dir_name, top_dir, url=None, executable=None, *, use_pdm=False)#
Bases:
SubprocessAlgorithmMixin
,InAlgorithm
,ABC
The model that does the magic.
Download code from given URL and create Pip environment with Pipfile found in the code.
- Parameters:
name (str) – Name of the model.
dir_name (str) – Where to download the code to (can be chosen freely).
top_dir (str) – Top directory of the repository where the Pipfile can be found (this is usually simply the last part of the repository URL).
url (str | None) – URL of the repository. (Default: None)
executable (list[str] | None) – Path to a Python executable. (Default: None.
use_pdm (bool) – If
True
, will try to use pdm instead of pipenv. (Default: False)
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- abstract fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- abstract property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- abstract predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- abstract run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Kamiran#
Kamiran and Calders 2012.
- class Reweighting(classifier=ClassifierType.lr, C=None, kernel=None)#
Bases:
InAlgorithm
An implementation of the Reweighing method from Kamiran&Calders, 2012.
Each sample is assigned an instance-weight based on the joing probability of S and Y which is used during training of a classifier.
- Parameters:
classifier (ClassifierType) – The classifier to use.
C (float | None) – The C parameter for the classifier.
kernel (KernelType | None) – The kernel to use for the classifier if SVM selected.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- compute_instance_weights(train, *, balance_groups=False, upweight=False)#
Compute weights for all samples.
- Parameters:
train (DataTuple) – The training data.
balance_groups (bool) –
Whether to balance the groups. When False, the groups are balanced as in Kamiran and Calders 2012. When True, the groups are numerically balanced. (Default: False)
upweight (bool) – If balance_groups is True, whether to upweight the groups, or to downweight them. Downweighting is done by multiplying the weights by the inverse of the group size and is more numerically stable for small group sizes. (Default: False)
- Returns:
A dataframe with the instance weights for each sample in the training data.
- Return type:
DataFrame
Kamishima#
Wrapper for calling Kamishima model.
- class Kamishima(*, eta=1.0)#
Bases:
InstalledModel
Model that calls Kamishima’s code.
Based on Algo-Fairness https://github.com/algofairness/fairness-comparison/blob/master/fairness/algorithms/kamishima/KamishimaAlgorithm.py
- Parameters:
eta (float) – Tolerance.
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Logistic regression#
Wrapper around Sci-Kit Learn Logistic Regression.
- class LR(C=<factory>)#
Bases:
InAlgorithmDC
Logistic regression with hard predictions.
This is a wrapper around Sci-Kit Learn’s LogisticRegression. See the sklearn documentation for details.
- Parameters:
C (float) – The regularization parameter.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class LRCV(n_splits=3)#
Bases:
InAlgorithmDC
Kind of a cheap hack for now, but gives a proper cross-valudeted LR.
- Parameters:
n_splits (int) – The number of splits for the cross-validation.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Majority#
Simply returns the majority label from the train set.
- class Majority#
Bases:
InAlgorithmNoParams
Simply returns the majority label from the train set.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Manual methods#
Manually specified (i.e. not learned) models.
- class Corels#
Bases:
InAlgorithmNoParams
CORELS (Certifiably Optimal RulE ListS) algorithm for the COMPAS dataset.
This algorithm uses if-statements to make predictions. It only works on COMPAS with s as sex.
From this paper: https://arxiv.org/abs/1704.01701
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
MLP#
Wrapper for SKLearn implementation of MLP.
- class MLP(hidden_layer_sizes=<factory>, batch_size=32, lr=0.001)#
Bases:
InAlgorithmDC
Multi-layer perceptron.
This is a wraper around the SKLearn implementation of the MLP. Documentation: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
- Parameters:
hidden_layer_sizes (Tuple[int, ...]) – The number of neurons in each hidden layer.
batch_size (int) –
lr (float) –
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Oracle#
Perfect predictors.
- class DPOracle#
Bases:
InAlgorithmNoParams
A perfect Demographic Parity Predictor.
Can only be used if test is a DataTuple, rather than the usual TestTuple. This model isn’t intended for general use, but can be useful if you want to either do a sanity check, or report potential values.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class Oracle#
Bases:
InAlgorithmNoParams
A perfect predictor.
Can only be used if test is a DataTuple, rather than the usual TestTuple. This model isn’t intended for general use, but can be useful if you want to either do a sanity check, or report potential values.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
SVM#
Wrapper for SKLearn implementation of SVM.
- class SVM(C=<factory>, kernel=<factory>)#
Bases:
InAlgorithmDC
A wrapper around the SciKitLearn Support Vector Classifier (SVC) model.
Documentation for the underlying classifier can be found here.
- Parameters:
C (float) – The penalty parameter of the error term.
kernel (KernelType) – The kernel to use.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
Zafar#
Algorithms by Zafar et al. for Demographic Parity.
- class ZafarAccuracy(*, gamma=0.5)#
Bases:
_ZafarAlgorithmBase
Zafar with fairness.
- Parameters:
gamma (float) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class ZafarBaseline#
Bases:
_ZafarAlgorithmBase
Zafar without fairness.
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class ZafarEqOdds(*, tau=5.0, mu=1.2, eps=0.0001)#
Bases:
ZafarEqOpp
Zafar for Equalised Odds.
- Parameters:
tau (float) –
mu (float) –
eps (float) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class ZafarEqOpp(*, tau=5.0, mu=1.2, eps=0.0001)#
Bases:
_ZafarAlgorithmBase
Zafar for Equality of Opportunity.
- Parameters:
tau (float) –
mu (float) –
eps (float) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- class ZafarFairness(*, C=0.001)#
Bases:
_ZafarAlgorithmBase
Zafar with fairness.
- Parameters:
C (float) –
- call_script(cmd_args, env=None, cwd=None)#
Call a (Python) script as a separate process.
- Parameters:
cmd_args (list[str]) – List of strings that are passed as commandline arguments to the executable.
env (dict[str, str] | None) – Environment variables specified as a dictionary; e.g.
{"PATH": "/usr/bin"}
.cwd (Path | None) – If not None, change working directory to the given path before running command.
- Raises:
RuntimeError – If the called script failed or timed out.
- Return type:
None
- property executable: list[str]#
Python executable from the virtualenv associated with the model.
- fit(train, seed=888)#
Fit Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
seed (int) – Random seed for model initialization.
- Returns:
Self, but trained.
- Return type:
Self
- property hyperparameters: Dict[str, bool | int | float | str]#
Return list of hyperparameters.
- property name: str#
Name of the algorithm.
- predict(test)#
Make predictions on the given data.
- Parameters:
test (SubgroupTuple | DataTuple) – Data to evaluate on.
- Returns:
Predictions on the test data.
- Return type:
- remove()#
Remove the directory that we created in
_clone_directory()
.- Return type:
None
- run(train, test, seed=888)#
Run Algorithm on the given data.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type:
- run_test(train, test, seed=888)#
Run with reduced training set so that it finishes quicker.
- Parameters:
train (DataTuple) – Data tuple of the training data.
test (SubgroupTuple | DataTuple) – Data to evaluate on.
seed (int) – Random seed for model initialization.
- Returns:
Predictions on the test data.
- Return type: