Transparency in Algorithmic Fairness

Novi Quadrianto

Reader in Machine Learning

Department of Informatics, University of Sussex

Machine learning systems

Machine learning systems are being implemented in all walks of life

Picture credit: Kevin Hong	Picture credit: AlgorithmWatch	Picture credit: Centre for Data Ethics and Innovation, UK
Social credit system, China	Personal budget calculation, UK	Financial services, Crime and justice,
	Loan decision, Finland	Recruitment,
	etc.	Local government

Algorithmic fairness definitions

## Mutual exclusivity by Bayes' rule
For each protected attribute $s$, we have: 
 $
 \underbrace{\text{Pr}(y=1|\widehat{y}=1)}_{\text{Positive Predicted Value (PPV)}} = \frac{\overbrace{\text{Pr}(\widehat{y}=1|y=1)}^{\text{True Positive Rate (TPR)}}\times\overbrace{\text{Pr}(y=1)}^{\text{Base Rate (BR)}}}{\text{Pr}(\widehat{y}=1|y=1)\text{Pr}(y=1) + \underbrace{\text{Pr}(\widehat{y}=1|y=-1)}_{\text{False Positive Rate (FPR)}}(1-\text{Pr}(y=1))}
 $
• Suppose we have FPRs=1 = FPRs=0 and TPRs=1 = TPRs=0 (equality of TPR/FPR), can we have PPVs=1 = PPVs=0 (equality of PPV)?

• YES! But only if we have a perfect dataset (i.e. BRs=1 = BRs=0) or a perfect predictor (i.e. FPR=0 and TPR=1 for s=1 and s=0)
Kehrenberg, Chen, NQ: Tuning fairness with quantities that matter, under review.

Algorithmic fairness methods with transparency

Transparency in fairness

Experiments on CelebA dataset: $202,599$ celebrity face images with $40$ attributes, gender as the binary protected attribute, the attribute smiling as the classification task

	Acc.	TPR Diff.	TPR male	TPR female
original image representation $\mathbf{x}\in\mathbb{R}^{2048}$	89.70	7.54	92.03	84.50
fair image representation in the data domain $T_{\omega}(\mathbf{x})\in\mathbb{R}^{2048}$	91.31	4.76	91.85	87.09


Non-spurious		Spurious

## Challenges?

- Spurious/non-spurious (w.r.t. gender) visualisations are too coarse!
 <table align="center" style="border-collapse: collapse; border: none;">
 <tbody ><tr style="border: none;"><td width="40%" colspan="4" style="border: none;"></td><td width="1%" style="border: none;"></td><td width="40%" style="border: none;"></td></tr>
 <tr style="border: none;">
 <td width="30%" style="border: none;">Spurious (what we have)</td>
 <td width="20%" style="border: none;"><img src="images/celeba_res/006126res.jpg" width=100% style="margin:-20px 0px"/></td>
 <td width="30%" style="border: none;">Spurious (what we want)</td>
 <td width="20%" style="border: none;"><img src="images/spurious.png" width=60% style="margin:-20px 0px"/></td>
 </tr></tbody>
 </table>
 - Residual unfairness (transferability) problem

<center>
 <img src="images/nyclu.png" width="35%" title="New York Civil Liberties Union" style="margin:-20px 0px"/>Picture credit: New York Civil Liberties Union
 </center>

Fair and transferable

Disentangling the latent space (c.f. on the reconstruction space) into two components:

Spurious: dependent on the protected attribute
Non-Spurious: independent of the protected attribute

Kehrenberg, Bartlett, Thomas, NQ, NoSiNN: Removing spurious correlations with null-sampling, under review

## Invertible Neural Network

- We leverage flow-based models to preserve all information relevant to $y$ that is independent of $s$ (during pre-training phase)
- Flow-based models permit exact likelihood estimation through warping a base density with a series of invertible transformation 

- We conjecture that the latent representations of flow-based models are more robust to out-of-distribution data

<!-- 
- The invertible network $f_\theta$ maps the inputs $x$ to a representation $z$: $f_\theta(x) = [z^{\text{spurious}}, z^{\text{invariant}}] := z$
- We have:
$\underset{\theta}{\text{min.}}\quad \mathbb{E}_x [- \log p_\theta (x|z)] + \lambda \text{Dep.}(z^{\text{invariant}}, s)$ with $\log p(x) = \log \mathcal{N}(z_0; 0, \mathbb{I}) + \sum_\{i=1\}^\{L\} \log | \det (\frac{\mathrm{d} f_i}{\mathrm{d}z_\{i-1\}})|$

\mathbb{E}_x [- \log p_\theta (x|z)] + \lambda \text{Dep.}(z_d, s)$, with $\log p(x) = \log \mathcal{N}(z; 0, \mathbb{I}) +  \sum_{i=1}^{L} \log | \det ( \frac{\mathop{}\!\mathrm{d} f_i}{ \mathrm{d}z_{i-1}}) |-->

Transparency in fairness

Original data

Non-Spurious

Spurious

• Gender as the protected attribute

• Unfortunately, the model lightens the skintone when gender-neutralising male faces

## Transparency in fairness

- All previous work with adversarial learning try to remove protected attributes from data
- Instead, we use adversarial learning to generate data points with pre-specified protected attributes (contrastive examples)
- Contrastive examples "can be easily interpreted"

<center>
 <table style="border-collapse: collapse; border: none;">
 <tbody ><tr style="border: none;"><td width="40%" colspan="4" style="border: none;"></td><td width="1%" style="border: none;"></td><td width="40%" style="border: none;"></td></tr>
 <tr style="border: none;">
 <td width="10%" style="border: none;"></td>
 <td width="30%" style="border: none;">Real</td>
 <td width="30%" style="border: none;">GAN contrastive</td>
 <td width="30%" style="border: none;">NN contrastive</td>
 </tr>
 <tr style="border: none;">
 <td width="10%" style="border: none;">Male</td>
 <td width="30%" style="border: none;"><img src="images/008526_m_real.jpg" width=60% title="RIm1"/></td>
 <td width="30%" style="border: none;"><img src="images/008526_m_fake.jpg" width=60% title="RIm1"/></td>
 <td width="30%" style="border: none;"><img src="images/008526_m_match.jpg" width=60% title="RIm1"/></td>
 </tr></tbody>
 </table></center>

Results on the CelebA dataset

• We use gender and age as the two protected attributes.

• We use smiling as the classification task.

method	Acc.	TPR Diff.	FPR Diff.
logistic regression (original)	89.71	6.69	6.40
logistic regression (original and GAN contrastive)	88.94	3.50	2.79
logistic regression (original and NN contrastive)	88.78	3.32	3.53
$\ddagger$logistic regression (original and GAN contrastive with output consistency)	94.15	3.51	2.18

$\ddagger$: Rejection learning -- classifier only makes a prediction if there is an agreement between original and contrastive examples (occurs in $17,237$ out of $20,000$ test examples, i.e. $86.185\%$).