ethicml.run#

Module for evaluators which apply algorithms over datasets and obtain metrics.

Classes:

`CVResults`	Stores the results of a cross validation experiment (see `CrossValidator`).
`CrossValidator`	A simple approach to Cross Validation.

Functions:

`arrange_in_parallel`	Arrange the given algorithms to run (embarrassingly) parallel.
`evaluate_models`	Evaluate all the given models for all the given datasets and compute all the given metrics.
`load_results`	Load results from a CSV file that was created by `evaluate_models()`.
`run_in_parallel`	Run the given algorithms (embarrassingly) parallel.

class CVResults(results, model)#

Bases: object

Stores the results of a cross validation experiment (see CrossValidator).

This object isn’t meant to be iterated over directly. Instead, use the raw_storage property to access the results across all folds. Or, use the mean_storage property to access the average results for each parameter setting.

import ethicml as em
from ethicml import data, metrics, models
from ethicml.run import CrossValidator

train, test = em.train_test_split(data.Compas().load())
hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]}

cv = CrossValidator(models.LR, hyperparams, folds=3)
primary = metrics.Accuracy()
fair_measure = metrics.AbsCV()
cv_results = cv.run(train, measures=[primary, fair_measure])
best_result = cv_results.get_best_in_top_k(primary, fair_measure, top_k=3)

print(f"Best C: {best_result.params['C']}")
print(f"Best Accuracy: {best_result.scores['Accuracy']}")
print(f"Best CV Score: {best_result.scores['CV absolute']}")
print(cv_results.mean_storage)
print(cv_results.raw_storage)

Parameters:

results (list[ResultTuple]) –
model (type[InAlgorithm]) –

best(measure)#

Return a model initialised with the best hyper-parameters.

The best hyper-parameters are those that perform optimally on average across folds for a given metric.

Parameters:: measure (Metric) –
Return type:: InAlgorithm

best_hyper_params(measure)#

Get hyper-parameters that return the ‘best’ result for the metric of interest.

Parameters:: measure (Metric) –
Return type:: dict[str, Any]

get_best_in_top_k(primary, secondary, top_k)#

Get best result in top-K entries.

First sort the results according to the primary metric, then take the best according to the secondary metric from the top K.

Parameters:

primary (Metric) – Metric to first sort by.
secondary (Metric) – Metric to sort the top-K models by for a second time, the top will be selected.
top_k (int) – Number of entries to consider.

Returns:

A tuple with the parameters, the fold ID and the scores.

Return type:

ResultTuple

get_best_result(measure)#

Get the hyperparameter combination for the best performance of a measure.

Parameters:: measure (Metric) –
Return type:: ResultTuple

class CrossValidator(model, hyperparams, folds=3, max_parallel=0)#

Bases: object

A simple approach to Cross Validation.

The CrossValidator object is used to run cross-validation on a model. Results are returned in a CVResults object.

import ethicml as em
from ethicml import data, metrics, models
from ethicml.run import CrossValidator

train, test = em.train_test_split(data.Compas().load())
hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]}

lr_cv = CrossValidator(models.LR, hyperparams, folds=3)

primary = metrics.Accuracy()
fair_measure = metrics.AbsCV()
cv_results = lr_cv.run(train, measures=[primary, fair_measure])

Parameters:

model (Type[InAlgorithm]) – the class (not an instance) of the model for cross validation
hyperparams (Mapping[str, Sequence[Any]]) – a dictionary where the keys are the names of hyperparameters and the values are lists of possible values for the hyperparameters
folds (int) – the number of folds
max_parallel (int) – the maximum number of parallel processes; if set to 0, use the default which is the number of available CPUs

run(train, measures=None)#

Run the cross validation experiments.

Parameters:

train (DataTuple) –
measures (list[Metric] | None) –

Return type:

CVResults

run_async(train, measures=None)#

Run the cross validation experiments asynchronously.

Parameters:

train (DataTuple) – the training data
measures (list[Metric] | None) – (Default: None)

Returns:

CVResults

Return type:

CVResults

arrange_in_parallel(algos, data, seeds, num_jobs=None)#

Arrange the given algorithms to run (embarrassingly) parallel.

Parameters:

algos (Sequence[Algorithm[_RT]]) – List of tuples consisting of a run_async function of an algorithm and a name.
data (Sequence[TrainValPair]) – List of pairs of data tuples (train and test).
seeds (list[int]) – List of random seeds.
num_jobs (int | None) – Number of parallel jobs. None means as many as available CPUs. (Default: None)

Returns:

list of the results

Return type:

list[list[_RT]]

evaluate_models(datasets, *, preprocess_models=(), inprocess_models=(), metrics=(), per_sens_metrics=(), repeats=1, test_mode=False, delete_previous=True, splitter=None, topic=None, fair_pipeline=True, num_jobs=None, scaler=None, repeat_on='both')#

Evaluate all the given models for all the given datasets and compute all the given metrics.

Parameters:

datasets (list[Dataset]) – List of dataset objects.
preprocess_models (Sequence[PreAlgorithm]) – List of preprocess model objects. (Default: ())
inprocess_models (Sequence[InAlgorithm]) – List of inprocess model objects. (Default: ())
metrics (Sequence[Metric]) – List of metric objects. (Default: ())
per_sens_metrics (Sequence[Metric]) – List of metric objects that will be evaluated per sensitive attribute. (Default: ())
repeats (int) – Number of repeats to perform for the experiments. (Default: 1)
test_mode (bool) – If True, only use a small subset of the data so that the models run faster. (Default: False)
delete_previous (bool) – True by default. If True, delete previous results in the directory.
splitter (DataSplitter | None) – Custom train-test splitter. (Default: None)
topic (str | None) – A string that identifies the run; the string is prepended to the filename. (Default: None)
fair_pipeline (bool) – if True, run fair inprocess algorithms on the output of preprocessing. (Default: True)
num_jobs (int | None) – Number of parallel jobs; if None, the number of CPUs is used. (Default: None)
scaler (ScalerType | None) – Sklearn-style scaler to be used on the continuous features. (Default: None)
repeat_on (Literal['data', 'model', 'both']) – Should the data or model seed be varied for each run? Or should they both be the same? (Default: “both”)

Returns:

A Results object.

Return type:

Results

load_results(dataset_name, transform_name, topic=None, outdir=PosixPath('results'))#

Load results from a CSV file that was created by evaluate_models().

Parameters:

dataset_name (str) – name of the dataset of the results
transform_name (str) – name of the transformation that was used for the results
topic (str | None) – (optional) topic string of the results (Default: None)
outdir (Path) – directory where the results are stored (Default: Path(“.”) / “results”)

Returns:

DataFrame if the file exists; None otherwise

Return type:

Results | None

run_in_parallel(algos, *, data, seeds, num_jobs=None)#

Run the given algorithms (embarrassingly) parallel.

Parameters:

algos (Sequence[InAlgorithm] | Sequence[PreAlgorithm]) – List of algorithms.
data (Sequence[TrainValPair]) – List of pairs of data tuples (train and test).
seeds (list[int]) – List of seeds to use when running the model.
num_jobs (int | None) – How many jobs can run in parallel at most. If None, use the number of CPUs (Default: None).

Returns:

list of the results

Return type:

List[List[Prediction]] | List[List[Tuple[DataTuple, DataTuple]]]