Cross-validation#

Cross Validation for any in process (at the moment) Algorithm.

Classes:

`CVResults`	Stores the results of a cross validation experiment.
`CrossValidator`	A simple approach to Cross Validation.

class CVResults(results, model)#

Bases: object

Stores the results of a cross validation experiment.

This object isn’t meant to be iterated over directly. Instead, use the raw_storage property to access the results across all folds. Or, use the mean_storage property to access the average results for each parameter setting.

import ethicml as em

train, test = em.train_test_split(em.Compas().load())
hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]}

cv = em.CrossValidator(em.LR, hyperparams, folds=3)
primary = em.Accuracy()
fair_measure = em.AbsCV()
cv_results = cv.run(train, measures=[primary, fair_measure])
best_result = cv_results.get_best_in_top_k(primary, fair_measure, top_k=3)

print(f"Best C: {best_result.params['C']}")
print(f"Best Accuracy: {best_result.scores['Accuracy']}")
print(f"Best CV Score: {best_result.scores['CV absolute']}")
print(cv_results.mean_storage)
print(cv_results.raw_storage)

Parameters

results (List[ethicml.evaluators.cross_validator.ResultTuple]) –
model (Type[ethicml.algorithms.inprocess.in_algorithm.InAlgorithm]) –

best(measure)#

Returns a model initialised with the hyper-parameters that perform optimally on average across folds for a given metric.

Parameters: measure (ethicml.metrics.metric.Metric) –
Return type: ethicml.algorithms.inprocess.in_algorithm.InAlgorithm

best_hyper_params(measure)#

Get hyper-parameters that return the ‘best’ result for the metric of interest.

Parameters: measure (ethicml.metrics.metric.Metric) –
Return type: Dict[str, Any]

get_best_in_top_k(primary, secondary, top_k)#

Get best result in top-K entries.

First sort the results according to the primary metric, then take the best according to the secondary metric from the top K.

Parameters

primary (ethicml.metrics.metric.Metric) – Metric to first sort by.
secondary (ethicml.metrics.metric.Metric) – Metric to sort the top-K models by for a second time, the top will be selected.
top_k (int) – Number of entries to consider.

Return type

ethicml.evaluators.cross_validator.ResultTuple

get_best_result(measure)#

Get the hyperparameter combination for the best performance of a measure.

Parameters: measure (ethicml.metrics.metric.Metric) –
Return type: ethicml.evaluators.cross_validator.ResultTuple

class CrossValidator(model, hyperparams, folds=3, max_parallel=0)#

Bases: object

A simple approach to Cross Validation.

The CrossValidator object is used to run cross-validation on a model. Results are returned in a CVResults object.

import ethicml as em

train, test = em.train_test_split(em.Compas().load())
hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]}

lr_cv = em.CrossValidator(em.LR, hyperparams, folds=3)

primary = em.Accuracy()
fair_measure = em.AbsCV()
cv_results = lr_cv.run(train, measures=[primary, fair_measure])

The constructor takes the following arguments.

Parameters

model (Type[ethicml.algorithms.inprocess.in_algorithm.InAlgorithm]) – the class (not an instance) of the model for cross validation
hyperparams (Mapping[str, Sequence[Any]]) – a dictionary where the keys are the names of hyperparameters and the values are lists of possible values for the hyperparameters
folds (int) – the number of folds
max_parallel (int) – the maximum number of parallel processes; if set to 0, use the default which is the number of available CPUs

run(train, measures=None)#

Run the cross validation experiments.

Parameters

train (ethicml.utility.data_structures.DataTuple) – the training data
measures (Optional[List[ethicml.metrics.metric.Metric]]) – an optional list of metrics to compute

Returns

CVResults

Return type

ethicml.evaluators.cross_validator.CVResults

run_async(train, measures=None)#

Run the cross validation experiments asynchronously.

Parameters

train (ethicml.utility.data_structures.DataTuple) – the training data
measures (Optional[List[ethicml.metrics.metric.Metric]]) – an optional list of metrics to compute

Returns

CVResults

Return type

ethicml.evaluators.cross_validator.CVResults