Preprocess algorithms#

Pre-process algorithms take the training data and transform it.

Preprocess base#

Abstract Base Class of all algorithms in the framework.

class PreAlgorithm(*args, **kwargs)#

Bases: ethicml.algorithms.algorithm_base.Algorithm, Protocol

Abstract Base Class for all algorithms that do pre-processing.

abstract fit(train)#

Fit transformer on the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
self (ethicml.algorithms.preprocess.pre_algorithm._PA) –

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm._PA, ethicml.utility.data_structures.DataTuple]

abstract property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

abstract run(train, test)#

Generate fair features with the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

abstract transform(data)#

Generate fair features with the given data.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

class PreAlgorithmAsync(*args, **kwargs)#

Bases: ethicml.algorithms.algorithm_base.SubprocessAlgorithmMixin, ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, Protocol

Pre-Algorithm that can be run blocking and asynchronously.

fit(train)#

Generate fair features with the given data asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

abstract property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

run(train, test)#

Generate fair features with the given data asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

class PreAlgorithmDC(seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm

PreAlgorithm dataclass base class.

Parameters: seed (int) –
Return type: None

abstract fit(train)#

Fit transformer on the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
self (ethicml.algorithms.preprocess.pre_algorithm._PA) –

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm._PA, ethicml.utility.data_structures.DataTuple]

abstract property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

abstract run(train, test)#

Generate fair features with the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

abstract transform(data)#

Generate fair features with the given data.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Beutel#

Beutel’s algorithm.

class Beutel(fairness='DP', *, dir='.', enc_size=(40,), adv_size=(40,), pred_size=(40,), enc_activation='Sigmoid()', adv_activation='Sigmoid()', batch_size=64, y_loss='BCELoss()', s_loss='BCELoss()', epochs=50, adv_weight=1.0, validation_pcnt=0.1, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithmAsync

Beutel’s adversarially learned fair representations.

Parameters

fairness (Literal['DP', 'EqOp', 'EqOd']) –
dir (Union[str, pathlib.Path]) –
enc_size (Sequence[int]) –
adv_size (Sequence[int]) –
pred_size (Sequence[int]) –
enc_activation (str) –
adv_activation (str) –
batch_size (int) –
y_loss (str) –
s_loss (str) –
epochs (int) –
adv_weight (float) –
validation_pcnt (float) –
seed (int) –

fit(train)#

Generate fair features with the given data asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

run(train, test)#

Generate fair features with the given data asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Calders#

Kamiran&Calders 2012, massaging.

class Calders(*, preferable_class, disadvantaged_group, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm

Massaging algorithm from Kamiran&Calders 2012.

Parameters

preferable_class (int) –
disadvantaged_group (int) –
seed (int) –

fit(train)#

Fit transformer on the given data.

Parameters: train (ethicml.utility.data_structures.DataTuple) – training data
Returns: a tuple of the pre-processed training data and the test data
Return type: Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

run(train, test)#

Generate fair features with the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Upsampling#

Simple upsampler that makes subgroups the same size as the majority group.

class Upsampler(strategy='uniform', seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm

Upsampler algorithm.

Given a datatuple, create a larger datatuple such that the subgroups have a balanced number of samples.

Parameters

strategy (Literal['uniform', 'preferential', 'naive']) –
seed (int) –

Return type

None

fit(train)#

Fit transformer on the given data.

Parameters: train (ethicml.utility.data_structures.DataTuple) – training data
Returns: a tuple of the pre-processed training data and the test data
Return type: Tuple[ethicml.algorithms.preprocess.upsampler.Upsampler, ethicml.utility.data_structures.DataTuple]

property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

run(train, test)#

Generate fair features with the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

VFAE#

Variational Fair Auto-Encoder by Louizos et al.

class VFAE(dataset, *, dir='.', supervised=True, epochs=10, batch_size=32, fairness='DI', latent_dims=50, z1_enc_size=None, z2_enc_size=None, z1_dec_size=None, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithmAsync

VFAE Object - see implementation file for details.

Parameters

dataset (str) –
dir (Union[str, pathlib.Path]) –
supervised (bool) –
epochs (int) –
batch_size (int) –
fairness (str) –
latent_dims (int) –
z1_enc_size (Optional[List[int]]) –
z2_enc_size (Optional[List[int]]) –
z1_dec_size (Optional[List[int]]) –
seed (int) –

fit(train)#

Generate fair features with the given data asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#: Name of the algorithm.

property out_size: int#: The number of features to generate.

run(train, test)#

Generate fair features with the given data asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data
test (ethicml.utility.data_structures.TestTuple) – test data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters

train (ethicml.utility.data_structures.DataTuple) –
test (ethicml.utility.data_structures.TestTuple) –

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters

train – training data
test – test data
data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Zemel#

Zemel’s Learned Fair Representations.

class Zemel(*, dir='.', threshold=0.5, clusters=2, Ax=0.01, Ay=0.1, Az=0.5, max_iter=5000, maxfun=5000, epsilon=1e-05, seed=888)#