by Oliver Thomas and Thomas Kehrenberg
{ot44,t.kehrenberg}@sussex.ac.uk - Predictive Analytics Lab (PAL), University of Sussex, UK
...using statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed.
But there are problems:
Because the world is complicated!
The data is unlikely to be perfect.
The developer's (often arbitrary) decisions have a downstream impact
Business goals don't always align with fair behaviour
Let's have all the fairness!
If we are fair with regards to all notions of fair, then we're fair... right?
Independence based fairness (i.e. Statistical Parity)
$$ \hat{Y} \perp S $$Separation based fairness (i.e. Equalised Odds/Opportunity)
$$ \hat{Y} \perp S | Y $$For both to hold (in binary classification), then either
Similarly, Sufficiency cannot hold with either notion of fairness.
Consider a university, and we are in charge of administration!
We can only accept 50% of all applicants.
10,000 applicants are female and 10,000 of applicants are male.
We have been tasked with being fair with regard to gender.
We have an acceptance criteria that is highly predictive of success.
80% of those who meet the acceptance criteria will successfully graduate.
Only 10% of those who don't meet the acceptance criteria will successfully graduate.
As we're a good university we have a lot of applications from people who don't meet the acceptance criteria.
60% of female applicants meet the acceptance criteria.
40% of male applicants meet the acceptance criteria.
Remember, we can only accept 50% of all applicants
Accepted | Not | |
---|---|---|
Actually Graduate | ||
Don't Graduate |
Accepted | Not | |
---|---|---|
Actually Graduate | ||
Don't Graduate |
Accepted | Not | |
---|---|---|
Actually Graduate | $10000 \times 0.6 \times 0.8$ | $10000 \times 0.4 \times 0.1$ |
Don't Graduate | $10000 \times 0.6 \times 0.2$ | $10000 \times 0.4 \times 0.9$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $10000 \times 0.4 \times 0.8$ | $10000 \times 0.6 \times 0.1$ |
Don't Graduate | $10000 \times 0.4 \times 0.2$ | $10000 \times 0.6 \times 0.9$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $4800$ | $400$ |
Don't Graduate | $1200$ | $3600$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $3200$ | $600$ |
Don't Graduate | $800$ | $5400$ |
Our current system satisfies calibration-by-group!
$$Y \perp S | \hat{Y}$$How would we solve this problem being fair using Statistical Parity as our measure?
Select 50% of applicants of both female and male applicants
10% of qualified female applicants are being rejected whilst an additional 10% of unqualified males are being accepted.
Accepted | Not | |
---|---|---|
Actually Graduate | $5000 \times 0.8$ | $(1000 \times 0.8) + (4000 \times 0.1)$ |
Don't Graduate | $5000 \times 0.2$ | $(1000 \times 0.2) + (4000 \times 0.9)$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $(4000 \times 0.8) + (1000 \times 0.1)$ | $5000 \times 0.1$ |
Don't Graduate | $(4000 \times 0.2) + (1000 \times 0.9)$ | $5000 \times 0.9$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $4000$ | $1200$ |
Don't Graduate | $1000$ | $3800$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $3300$ | $500$ |
Don't Graduate | $1700$ | $4500$ |
Accepted | Not | |
---|---|---|
Actually Graduate | -800 | 800 |
Don't Graduate | -200 | 200 |
Accepted | Not | |
---|---|---|
Actually Graduate | 100 | -100 |
Don't Graduate | 900 | -900 |
How would we solve this problem being fair using Equal Opportunity as our measure?
Accepted | Not | |
---|---|---|
Actually Graduate | $TP$ | $FN$ |
Don't Graduate | $FP$ | $TN$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $4440$ | $760$ |
Don't Graduate | $1110$ | $3690$ |
Accepted | Not | |
---|---|---|
Actually Graduate | $3245$ | $555$ |
Don't Graduate | $1205$ | $4995$ |
Accepted | Not | |
---|---|---|
Actually Graduate | -360 | 360 |
Don't Graduate | -90 | 90 |
Accepted | Not | |
---|---|---|
Actually Graduate | 45 | -45 |
Don't Graduate | 405 | -405 |