Preprocess algorithms#

Pre-process algorithms take the training data and transform it.

Preprocess base#

Abstract Base Class of all algorithms in the framework.

class PreAlgorithm(*args, **kwargs)#

Bases: ethicml.algorithms.algorithm_base.Algorithm, Protocol

Abstract Base Class for all algorithms that do pre-processing.

abstract fit(train)#

Fit transformer on the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm._PA, ethicml.utility.data_structures.DataTuple]

abstract property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

abstract run(train, test)#

Generate fair features with the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

abstract transform(data)#

Generate fair features with the given data.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

class PreAlgorithmAsync(*args, **kwargs)#

Bases: ethicml.algorithms.algorithm_base.SubprocessAlgorithmMixin, ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, Protocol

Pre-Algorithm that can be run blocking and asynchronously.

fit(train)#

Generate fair features with the given data asynchronously.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

abstract property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

run(train, test)#

Generate fair features with the given data asynchronously.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

class PreAlgorithmDC(seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm

PreAlgorithm dataclass base class.

Parameters

seed (int) –

Return type

None

abstract fit(train)#

Fit transformer on the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm._PA, ethicml.utility.data_structures.DataTuple]

abstract property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

abstract run(train, test)#

Generate fair features with the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

abstract transform(data)#

Generate fair features with the given data.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Beutel#

Beutel’s algorithm.

class Beutel(fairness='DP', *, dir='.', enc_size=(40,), adv_size=(40,), pred_size=(40,), enc_activation='Sigmoid()', adv_activation='Sigmoid()', batch_size=64, y_loss='BCELoss()', s_loss='BCELoss()', epochs=50, adv_weight=1.0, validation_pcnt=0.1, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithmAsync

Beutel’s adversarially learned fair representations.

Parameters
  • fairness (Literal['DP', 'EqOp', 'EqOd']) –

  • dir (Union[str, pathlib.Path]) –

  • enc_size (Sequence[int]) –

  • adv_size (Sequence[int]) –

  • pred_size (Sequence[int]) –

  • enc_activation (str) –

  • adv_activation (str) –

  • batch_size (int) –

  • y_loss (str) –

  • s_loss (str) –

  • epochs (int) –

  • adv_weight (float) –

  • validation_pcnt (float) –

  • seed (int) –

fit(train)#

Generate fair features with the given data asynchronously.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

run(train, test)#

Generate fair features with the given data asynchronously.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Calders#

Kamiran&Calders 2012, massaging.

class Calders(*, preferable_class, disadvantaged_group, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm

Massaging algorithm from Kamiran&Calders 2012.

Parameters
  • preferable_class (int) –

  • disadvantaged_group (int) –

  • seed (int) –

fit(train)#

Fit transformer on the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

run(train, test)#

Generate fair features with the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Upsampling#

Simple upsampler that makes subgroups the same size as the majority group.

class Upsampler(strategy='uniform', seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm

Upsampler algorithm.

Given a datatuple, create a larger datatuple such that the subgroups have a balanced number of samples.

Parameters
  • strategy (Literal['uniform', 'preferential', 'naive']) –

  • seed (int) –

Return type

None

fit(train)#

Fit transformer on the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.upsampler.Upsampler, ethicml.utility.data_structures.DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

run(train, test)#

Generate fair features with the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

VFAE#

Variational Fair Auto-Encoder by Louizos et al.

class VFAE(dataset, *, dir='.', supervised=True, epochs=10, batch_size=32, fairness='DI', latent_dims=50, z1_enc_size=None, z2_enc_size=None, z1_dec_size=None, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithmAsync

VFAE Object - see implementation file for details.

Parameters
  • dataset (str) –

  • dir (Union[str, pathlib.Path]) –

  • supervised (bool) –

  • epochs (int) –

  • batch_size (int) –

  • fairness (str) –

  • latent_dims (int) –

  • z1_enc_size (Optional[List[int]]) –

  • z2_enc_size (Optional[List[int]]) –

  • z1_dec_size (Optional[List[int]]) –

  • seed (int) –

fit(train)#

Generate fair features with the given data asynchronously.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

run(train, test)#

Generate fair features with the given data asynchronously.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T

Zemel#

Zemel’s Learned Fair Representations.

class Zemel(*, dir='.', threshold=0.5, clusters=2, Ax=0.01, Ay=0.1, Az=0.5, max_iter=5000, maxfun=5000, epsilon=1e-05, seed=888)#

Bases: ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithmAsync

AIF360 implementation of Zemel’s LFR.

Parameters
  • dir (Union[str, pathlib.Path]) –

  • threshold (float) –

  • clusters (int) –

  • Ax (float) –

  • Ay (float) –

  • Az (float) –

  • max_iter (int) –

  • maxfun (int) –

  • epsilon (float) –

  • seed (int) –

Return type

None

fit(train)#

Fit transformer on the given data.

Parameters

train (ethicml.utility.data_structures.DataTuple) – training data

Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm, ethicml.utility.data_structures.DataTuple]

property name: str#

Name of the algorithm.

property out_size: int#

The number of features to generate.

run(train, test)#

Generate fair features with the given data.

Parameters
Returns

a tuple of the pre-processed training data and the test data

Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

run_test(train, test)#

Run with reduced training set so that it finishes quicker.

Parameters
Return type

Tuple[ethicml.utility.data_structures.DataTuple, ethicml.utility.data_structures.TestTuple]

transform(data)#

Generate fair features with the given data asynchronously.

Parameters
  • train – training data

  • test – test data

  • data (ethicml.algorithms.preprocess.pre_algorithm.T) –

Returns

a tuple of the pre-processed training data and the test data

Return type

ethicml.algorithms.preprocess.pre_algorithm.T