What happens when you deploy an ML model?
An introduction to performative prediction with visuals
Javier Sanguino Bautiste
Thomas Kehrenberg
Novi Quadrianto
Deployed model = Happy ML engineer?
You are a machine learning engineer and you work for a bank. You create a model to predict the default risk of applicants. Your model works very well on your test set. The company decides to deploy your model, making you worry about performance in the real-world. However, these are unfounded fears: your model works well the following weeks after deployment. Everyone congratulates you and you are satisfied because you did your work well.
But… after some time, you realize that the model is not performing as well anymore. Repeat applicants have changed their financial characteristics to have a higher chance of getting the loan. These applicants have learned to “game” your model. New applicants do not follow the training distribution anymore. The mere (very?) deployment of your model has triggered a distribution shift in the data!
Once an ML model is deployed, it undoubtedly has an effect in the real world. This effect has been largely overlooked in Machine Learning, as model deployment is often the final step that the ML engineer is involed in. But if you want your model to work in practice, you need to take these effects into account.
Scenarios like the one described above have been formalized in the field of performative predictionperdomo2020performative. performative prediction happens when deploying a model causes a distribution shift in the data. Let \(\theta\) be the model parameters. We can model the effect then as a dependency of the data distribution on \(\theta\) through a distribution map, \(\mathcal{D}(\theta)\), a function from the set of model parameters to the space of data distributions. For a given model, \(\mathcal{D}(\theta)\) gives the data distribution induced by the model parameters \(\theta\). (We will often use \(\theta\) to refer both the model parameters and the model itself, as it is common in the literature.)
The following diagram illustrates this concept. On the left-hand side, we have the situation in the typical ML setup: There is a fixed data distribution \(\mathcal{D}\) on which we train our model \(\theta\). On the right-hand side, we have the performative prediction setup: The distribution map \(\mathcal{D}(\theta)\) defines a data distribution on which we train our model \(\theta\), but in turn, the model parameters affect the data distribution.
In order to formalize this feedback loop between the model and the data distribution, we describe it as an iterative process, which is shown in the following diagram. Before the process begins, we have an initial data distribution \(\mathcal{D}_0\), on which the first model \(\theta^{(0)}\) is trained. From the first model, we get the initial distribution \(\mathcal{D}(\theta^{(0)})\), the data distribution induced by the model parameters \(\theta^{(0)}\). At each subsequent time step, we first train model \(\theta^{(t+1)}\) on the data distribution \(\mathcal{D}(\theta^{(t)})\). And then, the model is deployed, which causes a new distribution shift to \(\mathcal{D}(\theta^{(t+1)})\):