Runnning experiments on the Adult dataset#
Installation#
First we need to install EthicML. You can install it from PyPI:
pip install ethicml
Loading the data#
EthicML includes some often used datasets from fairness literature. First, we load one of these… in this example we load the UCI Adult dataset
[1]:
import ethicml as em
from ethicml.data import Adult
adult = Adult()
data: em.DataTuple = adult.load()
assert (45222, 101) == data.x.shape
assert (45222,) == data.s.shape
assert (45222,) == data.y.shape
This loads the dataset as a DataTuple, which comprises \(x\) (features), \(s\) (sensitive attribute and \(y\) (class label). Each member of the DataTuple is stored as a Pandas DataFrame.
By default, the Adult dataset uses the binary attribute sex_Male
as the sensitive feature.
[2]:
data.s.to_frame().head()
[2]:
sex_Male | |
---|---|
0 | 1 |
1 | 1 |
2 | 1 |
3 | 1 |
4 | 0 |
If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.
[3]:
data: em.DataTuple = Adult(split=Adult.Splits.RACE).load()
assert (45222, 98) == data.x.shape
assert (45222,) == data.s.shape
assert (45222,) == data.y.shape
[4]:
data.s.to_frame().head()
[4]:
race | |
---|---|
0 | 4 |
1 | 4 |
2 | 4 |
3 | 2 |
4 | 4 |
However, we’re going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.
Fortunately, EthicML has a split for that: RACE_BINARY
.
[5]:
data = Adult(split=Adult.Splits.RACE_BINARY).load()
[6]:
data.s.to_frame().head()
[6]:
race_White | |
---|---|
0 | 1 |
1 | 1 |
2 | 1 |
3 | 0 |
4 | 1 |
Evaluating some models#
[7]:
from ethicml.run import evaluate_models
from ethicml.models import LR, Reweighting, SVM, Upsampler
from ethicml.metrics import Accuracy, CV, ProbPos, TPR
results = evaluate_models(
datasets=[em.data.Adult(split=Adult.Splits.RACE_BINARY)],
preprocess_models=[Upsampler()],
inprocess_models=[LR(), SVM(kernel=em.KernelType.linear), Reweighting()],
metrics=[Accuracy(), CV()],
per_sens_metrics=[Accuracy(), TPR(), ProbPos()],
repeats=2,
)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.2s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.5s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 1.6s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 2.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 2.4s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.7s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.7s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.2s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.3s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.2s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.4s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 1.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 1.6s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 1.9s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.1s finished
[8]:
results
[8]:
model_seed | seed | C | Accuracy | CV | Accuracy_race_White_1 | Accuracy_race_White_0 | Accuracy_race_White_0÷race_White_1 | Accuracy_race_White_0-race_White_1 | TPR_race_White_1 | TPR_race_White_0 | TPR_race_White_0÷race_White_1 | TPR_race_White_0-race_White_1 | prob_pos_race_White_1 | prob_pos_race_White_0 | prob_pos_race_White_0÷race_White_1 | prob_pos_race_White_0-race_White_1 | kernel | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dataset | scaler | transform | model | split_id | ||||||||||||||||||
Adult Race-Binary | None | no_transform | Logistic Regression (C=1.0) | 0 | 0.0 | 0.0 | 1.0 | 0.851078 | 0.925891 | 0.843309 | 0.899126 | 0.937920 | 0.055818 | 0.613392 | 0.621495 | 0.986962 | 0.008103 | 0.216286 | 0.142176 | 0.657354 | 0.074109 | NaN |
SVM (linear) | 0 | 0.0 | 0.0 | 1.0 | 0.851962 | 0.914970 | 0.844464 | 0.898332 | 0.940036 | 0.053868 | 0.619746 | 0.593458 | 0.957583 | 0.026288 | 0.218469 | 0.133439 | 0.610792 | 0.085030 | linear | |||
Kamiran & Calders lr C=1.0 | 0 | 0.0 | 0.0 | NaN | 0.850636 | 0.931427 | 0.843565 | 0.894361 | 0.943205 | 0.050795 | 0.612414 | 0.621495 | 0.985389 | 0.009081 | 0.215515 | 0.146942 | 0.681818 | 0.068573 | NaN | |||
Logistic Regression (C=1.0) | 1 | 1.0 | 2410.0 | 1.0 | 0.841902 | 0.931427 | 0.835231 | 0.884365 | 0.944442 | 0.049134 | 0.570395 | 0.553922 | 0.971119 | 0.016474 | 0.202124 | 0.133550 | 0.660737 | 0.068573 | NaN | |||
SVM (linear) | 1 | 1.0 | 2410.0 | 1.0 | 0.843449 | 0.929112 | 0.836382 | 0.888436 | 0.941409 | 0.052054 | 0.572324 | 0.558824 | 0.976411 | 0.013500 | 0.201996 | 0.131107 | 0.649061 | 0.070888 | linear | |||
Kamiran & Calders lr C=1.0 | 1 | 1.0 | 2410.0 | NaN | 0.844002 | 0.932603 | 0.837150 | 0.887622 | 0.943138 | 0.050472 | 0.581003 | 0.578431 | 0.995574 | 0.002572 | 0.205833 | 0.138436 | 0.672566 | 0.067397 | NaN | |||
Upsample uniform | Logistic Regression (C=1.0) | 0 | 0.0 | 0.0 | 1.0 | 0.849309 | 0.923988 | 0.841896 | 0.895155 | 0.940503 | 0.053259 | 0.615836 | 0.612150 | 0.994014 | 0.003686 | 0.218983 | 0.142971 | 0.652885 | 0.076012 | NaN | ||
SVM (linear) | 0 | 0.0 | 0.0 | 1.0 | 0.850083 | 0.929254 | 0.842538 | 0.896743 | 0.939553 | 0.054206 | 0.607038 | 0.616822 | 0.984138 | 0.009784 | 0.213717 | 0.142971 | 0.668972 | 0.070746 | linear | |||
Kamiran & Calders lr C=1.0 | 0 | 0.0 | 0.0 | NaN | 0.850636 | 0.925529 | 0.843437 | 0.895155 | 0.942225 | 0.051718 | 0.615836 | 0.612150 | 0.994014 | 0.003686 | 0.217442 | 0.142971 | 0.657513 | 0.074471 | NaN | |||
Logistic Regression (C=1.0) | 1 | 0.0 | 2410.0 | 1.0 | 0.842565 | 0.936138 | 0.835871 | 0.885179 | 0.944295 | 0.049309 | 0.570395 | 0.568627 | 0.996901 | 0.001768 | 0.201484 | 0.137622 | 0.683043 | 0.063862 | NaN | |||
SVM (linear) | 1 | 0.0 | 2410.0 | 1.0 | 0.840575 | 0.946316 | 0.833440 | 0.885993 | 0.940684 | 0.052554 | 0.557377 | 0.588235 | 0.947541 | 0.030858 | 0.197007 | 0.143322 | 0.727501 | 0.053684 | linear | |||
Kamiran & Calders lr C=1.0 | 1 | 0.0 | 2410.0 | NaN | 0.842565 | 0.936138 | 0.835871 | 0.885179 | 0.944295 | 0.049309 | 0.570395 | 0.568627 | 0.996901 | 0.001768 | 0.201484 | 0.137622 | 0.683043 | 0.063862 | NaN |
[9]:
results[
['Accuracy', 'Accuracy_race_White_0÷race_White_1', 'TPR_race_White_0÷race_White_1', 'prob_pos_race_White_0÷race_White_1']
].groupby(level=[0, 1, 2, 3]).agg(['mean', 'std'])
[9]:
Accuracy | Accuracy_race_White_0÷race_White_1 | TPR_race_White_0÷race_White_1 | prob_pos_race_White_0÷race_White_1 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
mean | std | mean | std | mean | std | mean | std | ||||
dataset | scaler | transform | model | ||||||||
Adult Race-Binary | None | Upsample uniform | Kamiran & Calders lr C=1.0 | 0.846600 | 0.005707 | 0.943260 | 0.001464 | 0.995457 | 0.002041 | 0.670278 | 0.018052 |
Logistic Regression (C=1.0) | 0.845937 | 0.004769 | 0.942399 | 0.002682 | 0.995457 | 0.002041 | 0.667964 | 0.021325 | |||
SVM (linear) | 0.845329 | 0.006723 | 0.940118 | 0.000800 | 0.965839 | 0.025878 | 0.698237 | 0.041386 | |||
no_transform | Kamiran & Calders lr C=1.0 | 0.847319 | 0.004691 | 0.943171 | 0.000048 | 0.990481 | 0.007202 | 0.677192 | 0.006542 | ||
Logistic Regression (C=1.0) | 0.846490 | 0.006489 | 0.941181 | 0.004611 | 0.979040 | 0.011203 | 0.659046 | 0.002392 | |||
SVM (linear) | 0.847706 | 0.006020 | 0.940723 | 0.000971 | 0.966997 | 0.013314 | 0.629927 | 0.027060 |
[ ]: