Fairness in Machine Learning

by Oliver Thomas and Thomas Kehrenberg

{ot44,t.kehrenberg}@sussex.ac.uk - Predictive Analytics Lab (PAL), University of Sussex, UK

Machine Learning

...using statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed.

But there are problems:

Algorithmic bias

machine learning systems are making decisions that affect humans
these decisions should be fair
by default machine learning algorithms tend to be biased in some way

why?

Because the world is complicated!

The data is unlikely to be perfect.

The developer's (often arbitrary) decisions have a downstream impact

Business goals don't always align with fair behaviour

Biased training data

Bias introduced by the ML algorithm

Which fairness criteria to choose?

Let's have all the fairness!

If we are fair with regards to all notions of fair, then we're fair... right?

Which fairness criteria to choose?

Independence based fairness (i.e. Statistical Parity)

$$ \hat{Y} \perp S $$

Separation based fairness (i.e. Equalised Odds/Opportunity)

$$ \hat{Y} \perp S | Y $$

Which fairness criteria to choose?

For both to hold (in binary classification), then either

$S \perp Y$, our data is fair, or
$\hat{Y} \perp Y$, we have a random predictor.

Similarly, Sufficiency cannot hold with either notion of fairness.

Illustrative Example

Which fairness criteria to choose?

Consider a university, and we are in charge of administration!

We can only accept 50% of all applicants.

10,000 applicants are female and 10,000 of applicants are male.

We have been tasked with being fair with regard to gender.

University Admission

We have an acceptance criteria that is highly predictive of success.

80% of those who meet the acceptance criteria will successfully graduate.

Only 10% of those who don't meet the acceptance criteria will successfully graduate.

University Admission

As we're a good university we have a lot of applications from people who don't meet the acceptance criteria.

60% of female applicants meet the acceptance criteria.

40% of male applicants meet the acceptance criteria.

Remember, we can only accept 50% of all applicants

What should we do?

Truth Tables

Female Applicants

	Accepted	Not
Actually Graduate
Don't Graduate

Male Applicants

	Accepted	Not
Actually Graduate
Don't Graduate

Truth Tables

Female Applicants

	Accepted	Not
Actually Graduate	$10000 \times 0.6 \times 0.8$	$10000 \times 0.4 \times 0.1$
Don't Graduate	$10000 \times 0.6 \times 0.2$	$10000 \times 0.4 \times 0.9$

Male Applicants

	Accepted	Not
Actually Graduate	$10000 \times 0.4 \times 0.8$	$10000 \times 0.6 \times 0.1$
Don't Graduate	$10000 \times 0.4 \times 0.2$	$10000 \times 0.6 \times 0.9$

Truth Tables

Female Applicants

	Accepted	Not
Actually Graduate	$4800$	$400$
Don't Graduate	$1200$	$3600$

Male Applicants

	Accepted	Not
Actually Graduate	$3200$	$600$
Don't Graduate	$800$	$5400$

University Admission

Our current system satisfies calibration-by-group!

$$Y \perp S | \hat{Y}$$

How would we solve this problem being fair using Statistical Parity as our measure?

???

Select 50% of applicants of both female and male applicants

10% of qualified female applicants are being rejected whilst an additional 10% of unqualified males are being accepted.

Female Applicants

	Accepted	Not
Actually Graduate	$5000 \times 0.8$	$(1000 \times 0.8) + (4000 \times 0.1)$
Don't Graduate	$5000 \times 0.2$	$(1000 \times 0.2) + (4000 \times 0.9)$

Male Applicants

	Accepted	Not
Actually Graduate	$(4000 \times 0.8) + (1000 \times 0.1)$	$5000 \times 0.1$
Don't Graduate	$(4000 \times 0.2) + (1000 \times 0.9)$	$5000 \times 0.9$

Female Applicants

	Accepted	Not
Actually Graduate	$4000$	$1200$
Don't Graduate	$1000$	$3800$

Male Applicants

	Accepted	Not
Actually Graduate	$3300$	$500$
Don't Graduate	$1700$	$4500$

Female Applicants

	Accepted	Not
Actually Graduate	-800	800
Don't Graduate	-200	200

Male Applicants

	Accepted	Not
Actually Graduate	100	-100
Don't Graduate	900	-900

How would we solve this problem being fair using Equal Opportunity as our measure?

???

$$TPR = \frac{TP}{TP + FN}$$

	Accepted	Not
Actually Graduate	$TP$	$FN$
Don't Graduate	$FP$	$TN$

Select 55.5% of female applicants and 44.5% of male applicants, giving a TPR of 85.4% for both groups.

Female Applicants

	Accepted	Not
Actually Graduate	$4440$	$760$
Don't Graduate	$1110$	$3690$

Male Applicants

	Accepted	Not
Actually Graduate	$3245$	$555$
Don't Graduate	$1205$	$4995$

Female Applicants

	Accepted	Not
Actually Graduate	-360	360
Don't Graduate	-90	90

Male Applicants

	Accepted	Not
Actually Graduate	45	-45
Don't Graduate	405	-405