Model evaluation#

Runs given metrics on given algorithms for given datasets.



Evaluate all the given models for all the given datasets and compute all the given metrics.


Evaluate all the given models for all the given datasets and compute all the given metrics.


Load results from a CSV file that was created by evaluate_models.


Run all the given metrics on the given predictions and return the results.

evaluate_models(datasets, preprocess_models=(), inprocess_models=(), metrics=(), per_sens_metrics=(), repeats=1, test_mode=False, delete_prev=False, splitter=None, topic=None, fair_pipeline=True, scaler=None, dataset_based_results=True)#

Evaluate all the given models for all the given datasets and compute all the given metrics.

  • datasets (List[]) – list of dataset objects

  • scaler (Optional[ethicml.preprocessing.scaling.ScalerType]) – scaler to use on the continuous features of the dataset.

  • preprocess_models (Sequence[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm]) – list of preprocess model objects

  • inprocess_models (Sequence[ethicml.algorithms.inprocess.in_algorithm.InAlgorithm]) – list of inprocess model objects

  • metrics (Sequence[ethicml.metrics.metric.Metric]) – list of metric objects

  • per_sens_metrics (Sequence[ethicml.metrics.metric.Metric]) – list of metric objects that will be evaluated per sensitive attribute

  • repeats (int) – number of repeats to perform for the experiments

  • test_mode (bool) – if True, only use a small subset of the data so that the models run faster

  • delete_prev (bool) – False by default. If True, delete saved results in directory

  • splitter (Optional[ethicml.preprocessing.train_test_split.DataSplitter]) – (optional) custom train-test splitter

  • topic (Optional[str]) – (optional) a string that identifies the run; the string is prepended to the filename

  • fair_pipeline (bool) – if True, run fair inprocess algorithms on the output of preprocessing

  • dataset_based_results (bool) – if True, use the name of the senisitive variable in the returned results. If False, refer to the sensitive varibale as S.

Return type


evaluate_models_async(*, datasets, preprocess_models=(), inprocess_models=(), metrics=(), per_sens_metrics=(), repeats=1, test_mode=False, delete_prev=False, splitter=None, topic=None, fair_pipeline=True, num_cpus=1, scaler=None)#

Evaluate all the given models for all the given datasets and compute all the given metrics.

  • datasets (List[]) – list of dataset objects

  • scaler (Optional[ethicml.preprocessing.scaling.ScalerType]) – Sklearn-style scaler to be used on the continuous features.

  • preprocess_models (Sequence[ethicml.algorithms.preprocess.pre_algorithm.PreAlgorithm]) – list of preprocess model objects

  • inprocess_models (Sequence[ethicml.algorithms.inprocess.in_algorithm.InAlgorithm]) – list of inprocess model objects

  • metrics (Sequence[ethicml.metrics.metric.Metric]) – list of metric objects

  • per_sens_metrics (Sequence[ethicml.metrics.metric.Metric]) – list of metric objects that will be evaluated per sensitive attribute

  • repeats (int) – number of repeats to perform for the experiments

  • test_mode (bool) – if True, only use a small subset of the data so that the models run faster

  • delete_prev (bool) – False by default. If True, delete saved results in directory

  • splitter (Optional[ethicml.preprocessing.train_test_split.DataSplitter]) – (optional) custom train-test splitter

  • topic (Optional[str]) – (optional) a string that identifies the run; the string is prepended to the filename

  • fair_pipeline (bool) – if True, run fair inprocess algorithms on the output of preprocessing

  • num_cpus (int) – number of CPUs to use

Return type


load_results(dataset_name, transform_name, topic=None, outdir=PosixPath('results'))#

Load results from a CSV file that was created by evaluate_models.

  • dataset_name (str) – name of the dataset of the results

  • transform_name (str) – name of the transformation that was used for the results

  • topic (Optional[str]) – (optional) topic string of the results

  • outdir (pathlib.Path) – directory where the results are stored


DataFrame if the file exists; None otherwise

Return type


run_metrics(predictions, actual, metrics=(), per_sens_metrics=(), diffs_and_ratios=True, use_sens_name=True)#

Run all the given metrics on the given predictions and return the results.

Return type

Dict[str, float]