ethicml.run#

Module for evaluators which apply algorithms over datasets and obtain metrics.

Classes:

CVResults

Stores the results of a cross validation experiment (see CrossValidator).

CrossValidator

A simple approach to Cross Validation.

Functions:

arrange_in_parallel

Arrange the given algorithms to run (embarrassingly) parallel.

evaluate_models

Evaluate all the given models for all the given datasets and compute all the given metrics.

load_results

Load results from a CSV file that was created by evaluate_models().

run_in_parallel

Run the given algorithms (embarrassingly) parallel.

class CVResults(results, model)#

Bases: object

Stores the results of a cross validation experiment (see CrossValidator).

This object isn’t meant to be iterated over directly. Instead, use the raw_storage property to access the results across all folds. Or, use the mean_storage property to access the average results for each parameter setting.

import ethicml as em
from ethicml import data, metrics, models
from ethicml.run import CrossValidator

train, test = em.train_test_split(data.Compas().load())
hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]}

cv = CrossValidator(models.LR, hyperparams, folds=3)
primary = metrics.Accuracy()
fair_measure = metrics.AbsCV()
cv_results = cv.run(train, measures=[primary, fair_measure])
best_result = cv_results.get_best_in_top_k(primary, fair_measure, top_k=3)

print(f"Best C: {best_result.params['C']}")
print(f"Best Accuracy: {best_result.scores['Accuracy']}")
print(f"Best CV Score: {best_result.scores['CV absolute']}")
print(cv_results.mean_storage)
print(cv_results.raw_storage)
Parameters:
  • results (list[ResultTuple]) –

  • model (type[InAlgorithm]) –

best(measure)#

Return a model initialised with the best hyper-parameters.

The best hyper-parameters are those that perform optimally on average across folds for a given metric.

Parameters:

measure (Metric) –

Return type:

InAlgorithm

best_hyper_params(measure)#

Get hyper-parameters that return the ‘best’ result for the metric of interest.

Parameters:

measure (Metric) –

Return type:

dict[str, Any]

get_best_in_top_k(primary, secondary, top_k)#

Get best result in top-K entries.

First sort the results according to the primary metric, then take the best according to the secondary metric from the top K.

Parameters:
  • primary (Metric) – Metric to first sort by.

  • secondary (Metric) – Metric to sort the top-K models by for a second time, the top will be selected.

  • top_k (int) – Number of entries to consider.

Returns:

A tuple with the parameters, the fold ID and the scores.

Return type:

ResultTuple

get_best_result(measure)#

Get the hyperparameter combination for the best performance of a measure.

Parameters:

measure (Metric) –

Return type:

ResultTuple

class CrossValidator(model, hyperparams, folds=3, max_parallel=0)#

Bases: object

A simple approach to Cross Validation.

The CrossValidator object is used to run cross-validation on a model. Results are returned in a CVResults object.

import ethicml as em
from ethicml import data, metrics, models
from ethicml.run import CrossValidator

train, test = em.train_test_split(data.Compas().load())
hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]}

lr_cv = CrossValidator(models.LR, hyperparams, folds=3)

primary = metrics.Accuracy()
fair_measure = metrics.AbsCV()
cv_results = lr_cv.run(train, measures=[primary, fair_measure])
Parameters:
  • model (Type[InAlgorithm]) – the class (not an instance) of the model for cross validation

  • hyperparams (Mapping[str, Sequence[Any]]) – a dictionary where the keys are the names of hyperparameters and the values are lists of possible values for the hyperparameters

  • folds (int) – the number of folds

  • max_parallel (int) – the maximum number of parallel processes; if set to 0, use the default which is the number of available CPUs

run(train, measures=None)#

Run the cross validation experiments.

Parameters:
Return type:

CVResults

run_async(train, measures=None)#

Run the cross validation experiments asynchronously.

Parameters:
  • train (DataTuple) – the training data

  • measures (list[Metric] | None) – (Default: None)

Returns:

CVResults

Return type:

CVResults

arrange_in_parallel(algos, data, seeds, num_jobs=None)#

Arrange the given algorithms to run (embarrassingly) parallel.

Parameters:
  • algos (Sequence[Algorithm[_RT]]) – List of tuples consisting of a run_async function of an algorithm and a name.

  • data (Sequence[TrainValPair]) – List of pairs of data tuples (train and test).

  • seeds (list[int]) – List of random seeds.

  • num_jobs (int | None) – Number of parallel jobs. None means as many as available CPUs. (Default: None)

Returns:

list of the results

Return type:

list[list[_RT]]

evaluate_models(datasets, *, preprocess_models=(), inprocess_models=(), metrics=(), per_sens_metrics=(), repeats=1, test_mode=False, delete_previous=True, splitter=None, topic=None, fair_pipeline=True, num_jobs=None, scaler=None, repeat_on='both')#

Evaluate all the given models for all the given datasets and compute all the given metrics.

Parameters:
  • datasets (list[Dataset]) – List of dataset objects.

  • preprocess_models (Sequence[PreAlgorithm]) – List of preprocess model objects. (Default: ())

  • inprocess_models (Sequence[InAlgorithm]) – List of inprocess model objects. (Default: ())

  • metrics (Sequence[Metric]) – List of metric objects. (Default: ())

  • per_sens_metrics (Sequence[Metric]) – List of metric objects that will be evaluated per sensitive attribute. (Default: ())

  • repeats (int) – Number of repeats to perform for the experiments. (Default: 1)

  • test_mode (bool) – If True, only use a small subset of the data so that the models run faster. (Default: False)

  • delete_previous (bool) – True by default. If True, delete previous results in the directory.

  • splitter (DataSplitter | None) – Custom train-test splitter. (Default: None)

  • topic (str | None) – A string that identifies the run; the string is prepended to the filename. (Default: None)

  • fair_pipeline (bool) – if True, run fair inprocess algorithms on the output of preprocessing. (Default: True)

  • num_jobs (int | None) – Number of parallel jobs; if None, the number of CPUs is used. (Default: None)

  • scaler (ScalerType | None) – Sklearn-style scaler to be used on the continuous features. (Default: None)

  • repeat_on (Literal['data', 'model', 'both']) – Should the data or model seed be varied for each run? Or should they both be the same? (Default: “both”)

Returns:

A Results object.

Return type:

Results

load_results(dataset_name, transform_name, topic=None, outdir=PosixPath('results'))#

Load results from a CSV file that was created by evaluate_models().

Parameters:
  • dataset_name (str) – name of the dataset of the results

  • transform_name (str) – name of the transformation that was used for the results

  • topic (str | None) – (optional) topic string of the results (Default: None)

  • outdir (Path) – directory where the results are stored (Default: Path(“.”) / “results”)

Returns:

DataFrame if the file exists; None otherwise

Return type:

Results | None

run_in_parallel(algos, *, data, seeds, num_jobs=None)#

Run the given algorithms (embarrassingly) parallel.

Parameters:
  • algos (Sequence[InAlgorithm] | Sequence[PreAlgorithm]) – List of algorithms.

  • data (Sequence[TrainValPair]) – List of pairs of data tuples (train and test).

  • seeds (list[int]) – List of seeds to use when running the model.

  • num_jobs (int | None) – How many jobs can run in parallel at most. If None, use the number of CPUs (Default: None).

Returns:

list of the results

Return type:

List[List[Prediction]] | List[List[Tuple[DataTuple, DataTuple]]]