Cross-validation#
Cross Validation for any in process (at the moment) Algorithm.
Classes:
Stores the results of a cross validation experiment. |
|
A simple approach to Cross Validation. |
- class CVResults(results, model)#
Bases:
object
Stores the results of a cross validation experiment.
This object isn’t meant to be iterated over directly. Instead, use the raw_storage property to access the results across all folds. Or, use the mean_storage property to access the average results for each parameter setting.
import ethicml as em train, test = em.train_test_split(em.Compas().load()) hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]} cv = em.CrossValidator(em.LR, hyperparams, folds=3) primary = em.Accuracy() fair_measure = em.AbsCV() cv_results = cv.run(train, measures=[primary, fair_measure]) best_result = cv_results.get_best_in_top_k(primary, fair_measure, top_k=3) print(f"Best C: {best_result.params['C']}") print(f"Best Accuracy: {best_result.scores['Accuracy']}") print(f"Best CV Score: {best_result.scores['CV absolute']}") print(cv_results.mean_storage) print(cv_results.raw_storage)
- Parameters
results (List[ethicml.evaluators.cross_validator.ResultTuple]) –
model (Type[ethicml.algorithms.inprocess.in_algorithm.InAlgorithm]) –
- best(measure)#
Returns a model initialised with the hyper-parameters that perform optimally on average across folds for a given metric.
- Parameters
measure (ethicml.metrics.metric.Metric) –
- Return type
- best_hyper_params(measure)#
Get hyper-parameters that return the ‘best’ result for the metric of interest.
- Parameters
measure (ethicml.metrics.metric.Metric) –
- Return type
Dict[str, Any]
- get_best_in_top_k(primary, secondary, top_k)#
Get best result in top-K entries.
First sort the results according to the primary metric, then take the best according to the secondary metric from the top K.
- Parameters
primary (ethicml.metrics.metric.Metric) – Metric to first sort by.
secondary (ethicml.metrics.metric.Metric) – Metric to sort the top-K models by for a second time, the top will be selected.
top_k (int) – Number of entries to consider.
- Return type
ethicml.evaluators.cross_validator.ResultTuple
- get_best_result(measure)#
Get the hyperparameter combination for the best performance of a measure.
- Parameters
measure (ethicml.metrics.metric.Metric) –
- Return type
ethicml.evaluators.cross_validator.ResultTuple
- class CrossValidator(model, hyperparams, folds=3, max_parallel=0)#
Bases:
object
A simple approach to Cross Validation.
The CrossValidator object is used to run cross-validation on a model. Results are returned in a CVResults object.
import ethicml as em train, test = em.train_test_split(em.Compas().load()) hyperparams = {"C": [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]} lr_cv = em.CrossValidator(em.LR, hyperparams, folds=3) primary = em.Accuracy() fair_measure = em.AbsCV() cv_results = lr_cv.run(train, measures=[primary, fair_measure])
The constructor takes the following arguments.
- Parameters
model (Type[ethicml.algorithms.inprocess.in_algorithm.InAlgorithm]) – the class (not an instance) of the model for cross validation
hyperparams (Mapping[str, Sequence[Any]]) – a dictionary where the keys are the names of hyperparameters and the values are lists of possible values for the hyperparameters
folds (int) – the number of folds
max_parallel (int) – the maximum number of parallel processes; if set to 0, use the default which is the number of available CPUs
- run(train, measures=None)#
Run the cross validation experiments.
- Parameters
train (ethicml.utility.data_structures.DataTuple) – the training data
measures (Optional[List[ethicml.metrics.metric.Metric]]) – an optional list of metrics to compute
- Returns
CVResults
- Return type
- run_async(train, measures=None)#
Run the cross validation experiments asynchronously.
- Parameters
train (ethicml.utility.data_structures.DataTuple) – the training data
measures (Optional[List[ethicml.metrics.metric.Metric]]) – an optional list of metrics to compute
- Returns
CVResults
- Return type