Runnning experiments on the Adult dataset#

Installation#

First we need to install EthicML. You can install it from PyPI:

pip install ethicml

Loading the data#

EthicML includes some often used datasets from fairness literature. First, we load one of these… in this example we load the UCI Adult dataset

[1]:
import ethicml as em
from ethicml.data import Adult

adult = Adult()
data: em.DataTuple = adult.load()
assert (45222, 101) == data.x.shape
assert (45222,) == data.s.shape
assert (45222,) == data.y.shape

This loads the dataset as a DataTuple, which comprises \(x\) (features), \(s\) (sensitive attribute and \(y\) (class label). Each member of the DataTuple is stored as a Pandas DataFrame.

By default, the Adult dataset uses the binary attribute sex_Male as the sensitive feature.

[2]:
data.s.to_frame().head()
[2]:
sex_Male
0 1
1 1
2 1
3 1
4 0

If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.

[3]:
data: em.DataTuple = Adult(split=Adult.Splits.RACE).load()
assert (45222, 98) == data.x.shape
assert (45222,) == data.s.shape
assert (45222,) == data.y.shape
[4]:
data.s.to_frame().head()
[4]:
race
0 4
1 4
2 4
3 2
4 4

However, we’re going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.

Fortunately, EthicML has a split for that: RACE_BINARY.

[5]:
data = Adult(split=Adult.Splits.RACE_BINARY).load()
[6]:
data.s.to_frame().head()
[6]:
race_White
0 1
1 1
2 1
3 0
4 1

Evaluating some models#

[7]:
from ethicml.run import evaluate_models
from ethicml.models import LR, Reweighting, SVM, Upsampler
from ethicml.metrics import Accuracy, CV, ProbPos, TPR

results = evaluate_models(
    datasets=[em.data.Adult(split=Adult.Splits.RACE_BINARY)],
    preprocess_models=[Upsampler()],
    inprocess_models=[LR(), SVM(kernel=em.KernelType.linear), Reweighting()],
    metrics=[Accuracy(), CV()],
    per_sens_metrics=[Accuracy(), TPR(), ProbPos()],
    repeats=2,
)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.5s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    1.6s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    2.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    2.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    2.7s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    2.7s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    1.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    1.6s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.9s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    2.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    2.1s finished
[8]:
results
[8]:
model_seed seed C Accuracy CV Accuracy_race_White_1 Accuracy_race_White_0 Accuracy_race_White_0÷race_White_1 Accuracy_race_White_0-race_White_1 TPR_race_White_1 TPR_race_White_0 TPR_race_White_0÷race_White_1 TPR_race_White_0-race_White_1 prob_pos_race_White_1 prob_pos_race_White_0 prob_pos_race_White_0÷race_White_1 prob_pos_race_White_0-race_White_1 kernel
dataset scaler transform model split_id
Adult Race-Binary None no_transform Logistic Regression (C=1.0) 0 0.0 0.0 1.0 0.851078 0.925891 0.843309 0.899126 0.937920 0.055818 0.613392 0.621495 0.986962 0.008103 0.216286 0.142176 0.657354 0.074109 NaN
SVM (linear) 0 0.0 0.0 1.0 0.851962 0.914970 0.844464 0.898332 0.940036 0.053868 0.619746 0.593458 0.957583 0.026288 0.218469 0.133439 0.610792 0.085030 linear
Kamiran & Calders lr C=1.0 0 0.0 0.0 NaN 0.850636 0.931427 0.843565 0.894361 0.943205 0.050795 0.612414 0.621495 0.985389 0.009081 0.215515 0.146942 0.681818 0.068573 NaN
Logistic Regression (C=1.0) 1 1.0 2410.0 1.0 0.841902 0.931427 0.835231 0.884365 0.944442 0.049134 0.570395 0.553922 0.971119 0.016474 0.202124 0.133550 0.660737 0.068573 NaN
SVM (linear) 1 1.0 2410.0 1.0 0.843449 0.929112 0.836382 0.888436 0.941409 0.052054 0.572324 0.558824 0.976411 0.013500 0.201996 0.131107 0.649061 0.070888 linear
Kamiran & Calders lr C=1.0 1 1.0 2410.0 NaN 0.844002 0.932603 0.837150 0.887622 0.943138 0.050472 0.581003 0.578431 0.995574 0.002572 0.205833 0.138436 0.672566 0.067397 NaN
Upsample uniform Logistic Regression (C=1.0) 0 0.0 0.0 1.0 0.849309 0.923988 0.841896 0.895155 0.940503 0.053259 0.615836 0.612150 0.994014 0.003686 0.218983 0.142971 0.652885 0.076012 NaN
SVM (linear) 0 0.0 0.0 1.0 0.850083 0.929254 0.842538 0.896743 0.939553 0.054206 0.607038 0.616822 0.984138 0.009784 0.213717 0.142971 0.668972 0.070746 linear
Kamiran & Calders lr C=1.0 0 0.0 0.0 NaN 0.850636 0.925529 0.843437 0.895155 0.942225 0.051718 0.615836 0.612150 0.994014 0.003686 0.217442 0.142971 0.657513 0.074471 NaN
Logistic Regression (C=1.0) 1 0.0 2410.0 1.0 0.842565 0.936138 0.835871 0.885179 0.944295 0.049309 0.570395 0.568627 0.996901 0.001768 0.201484 0.137622 0.683043 0.063862 NaN
SVM (linear) 1 0.0 2410.0 1.0 0.840575 0.946316 0.833440 0.885993 0.940684 0.052554 0.557377 0.588235 0.947541 0.030858 0.197007 0.143322 0.727501 0.053684 linear
Kamiran & Calders lr C=1.0 1 0.0 2410.0 NaN 0.842565 0.936138 0.835871 0.885179 0.944295 0.049309 0.570395 0.568627 0.996901 0.001768 0.201484 0.137622 0.683043 0.063862 NaN
[9]:
results[
    ['Accuracy', 'Accuracy_race_White_0÷race_White_1', 'TPR_race_White_0÷race_White_1', 'prob_pos_race_White_0÷race_White_1']
].groupby(level=[0, 1, 2, 3]).agg(['mean', 'std'])
[9]:
Accuracy Accuracy_race_White_0÷race_White_1 TPR_race_White_0÷race_White_1 prob_pos_race_White_0÷race_White_1
mean std mean std mean std mean std
dataset scaler transform model
Adult Race-Binary None Upsample uniform Kamiran & Calders lr C=1.0 0.846600 0.005707 0.943260 0.001464 0.995457 0.002041 0.670278 0.018052
Logistic Regression (C=1.0) 0.845937 0.004769 0.942399 0.002682 0.995457 0.002041 0.667964 0.021325
SVM (linear) 0.845329 0.006723 0.940118 0.000800 0.965839 0.025878 0.698237 0.041386
no_transform Kamiran & Calders lr C=1.0 0.847319 0.004691 0.943171 0.000048 0.990481 0.007202 0.677192 0.006542
Logistic Regression (C=1.0) 0.846490 0.006489 0.941181 0.004611 0.979040 0.011203 0.659046 0.002392
SVM (linear) 0.847706 0.006020 0.940723 0.000971 0.966997 0.013314 0.629927 0.027060
[ ]: