Metrics#

This module contains metrics which can be applied to prediction results.

Some example code

import ethicml as em

em.run_metrics(predictions, test_data, metrics=[em.Accuracy(), em.TPR(), em.ProbPos()])

Classes:

`AS`	Anti-spurious metric.
`AbsCV`	Absolute value of Calder-Verwer.
`Accuracy`	Classification accuracy.
`AverageOddsDiff`	Average Odds Difference.
`BCR`	Balanced Classification Rate.
`BalancedAccuracy`	Accuracy that is balanced with respect to the class labels.
`CV`	Calder-Verwer.
`DependencyTarget`	The variable that is compared to the predictions in order to check how similar they are.
`F1`	F1 score: harmonic mean of precision and recall.
`FNR`	False negative rate.
`FPR`	False positive rate.
`Hsic`	See module string.
`Metric`	Base class for all metrics.
`NMI`	Normalized Mutual Information.
`NPV`	Negative predictive value.
`PPV`	Positive predictive value.
`ProbNeg`	Probability of negative prediction.
`ProbOutcome`	Mean of logits.
`ProbPos`	Probability of positive prediction.
`RenyiCorrelation`	Renyi correlation.
`SklearnMetric`	Wrapper around an sklearn metric.
`TNR`	True negative rate.
`TPR`	True positive rate.
`Theil`	Theil Index.
`Yanovich`	Yanovich Metric.

Exceptions:

LabelOutOfBounds

Metric Not Applicable per sensitive attribute, apply to whole dataset instead.

Functions:

confusion_matrix

Apply sci-kit learn's confusion matrix.

class AS#

Bases: ethicml.metrics.metric.BaseMetric

Anti-spurious metric.

Computes \(P(\hat{y}=y|y\neq s)\).

Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class AbsCV(pos_class=1, labels=None)#

Bases: ethicml.metrics.cv.CV

Absolute value of Calder-Verwer.

This metric is supposed to make it easier to compare results.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class Accuracy#

Bases: ethicml.metrics.accuracy.SklearnMetric

Classification accuracy.

Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class AverageOddsDiff(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Average Odds Difference.

\(\tfrac{1}{2}\left[(FPR_{s=0} - FPR_{s=1}) + (TPR_{s=0} - TPR_{s=1}))\right]\).

A value of 0 indicates equality of odds.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class BCR(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Balanced Classification Rate.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class BalancedAccuracy(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Accuracy that is balanced with respect to the class labels.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class CV(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Calder-Verwer.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class DependencyTarget(value)#

Bases: enum.Enum

The variable that is compared to the predictions in order to check how similar they are.

class F1#

Bases: ethicml.metrics.accuracy.SklearnMetric

F1 score: harmonic mean of precision and recall.

Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class FNR(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

False negative rate.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class FPR(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

False positive rate.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class Hsic(seed=888)#

Bases: ethicml.metrics.metric.BaseMetric

See module string.

Parameters: seed (int) –
Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

We add the ability to take the average of hsic score.

As for larger datasets it will kill your machine

Parameters

prediction (ethicml.utility.data_structures.Prediction) –
actual (ethicml.utility.data_structures.DataTuple) –

Return type

float

exception LabelOutOfBounds#

Bases: Exception

Metric Not Applicable per sensitive attribute, apply to whole dataset instead.

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class Metric(*args, **kwargs)#

Bases: Protocol

Base class for all metrics.

apply_per_sensitive: ClassVar[bool]#: Whether the metric can be applied per sensitive attribute.

abstract property name: str#: Name of the metric.

abstract score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class NMI(base=ranzen.enum_name_str.s)#

Bases: ethicml.metrics.dependence_measures._DependenceMeasure

Normalized Mutual Information.

Also called V-Measure. Defined in this paper: https://www.aclweb.org/anthology/D07-1043.pdf

Parameters: base (ranzen.enum_name_str) –
Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class NPV(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Negative predictive value.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class PPV(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Positive predictive value.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class ProbNeg(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Probability of negative prediction.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class ProbOutcome#

Bases: ethicml.metrics.metric.BaseMetric

Mean of logits.

Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class ProbPos(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

Probability of positive prediction.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class RenyiCorrelation(base=ranzen.enum_name_str.s)#

Bases: ethicml.metrics.dependence_measures._DependenceMeasure

Renyi correlation. Measures how dependent two random variables are.

As defined in this paper: https://link.springer.com/content/pdf/10.1007/BF02024507.pdf , titled “On Measures of Dependence” by Alfréd Rényi.

Parameters: base (ranzen.enum_name_str) –
Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class SklearnMetric(*args, **kwargs)#

Bases: ethicml.metrics.metric.BaseMetric, Protocol

Wrapper around an sklearn metric.

apply_per_sensitive: ClassVar[bool]#: Whether the metric can be applied per sensitive attribute.

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class TNR(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

True negative rate.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class TPR(pos_class=1, labels=None)#

Bases: ethicml.metrics.metric.CfmMetric

True positive rate.

Parameters

pos_class (int) –
labels (Optional[List[int]]) –

Return type

None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class Theil#

Bases: ethicml.metrics.metric.BaseMetric

Theil Index.

Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

class Yanovich(base=ranzen.enum_name_str.s)#

Bases: ethicml.metrics.dependence_measures._DependenceMeasure

Yanovich Metric. Measures how dependent two random variables are.

As defined in this paper: https://arxiv.org/abs/1008.0492

Parameters: base (ranzen.enum_name_str) –
Return type: None

property name: str#: Name of the metric.

score(prediction, actual)#

Compute score.

Parameters

prediction (ethicml.utility.data_structures.Prediction) – predicted labels
actual (ethicml.utility.data_structures.DataTuple) – DataTuple with the actual labels and the sensitive attributes

Returns

the score as a single number

Return type

float

confusion_matrix(prediction, actual, pos_cls, labels=None)#

Apply sci-kit learn’s confusion matrix.

Parameters

prediction (ethicml.utility.data_structures.Prediction) –
actual (ethicml.utility.data_structures.DataTuple) –
pos_cls (int) –
labels (Optional[List[int]]) –

Return type

Tuple[int, int, int, int]

Utils#

Compare metrics between groups