Clustering Methods and Unobserved Heterogeneity
It is an obvious fact that the world is heterogeneous: some countries are organized in democracies while others are ruled in autarchy, some firms are able to produce more than others, and some workers receive stratospheric salaries while others only make the minimum wage. It is certainly also obvious, at least to the empirical researcher, that the world is heterogeneous in “unobservable ways”. That is, countries differ in their political regime even when their populations have similar income distributions, firms differ in their productivity levels even when they have similar number of employees and produce similar goods, and workers receive different salaries despite similar educational attainments and years of experience in a certain occupation.
As economists we might be interested in understanding the origins of heterogeneity through variation in observable characteristics. This exercise could potentially be useful to design actionable policies that might be welfare enhancing in some sense. As an example: if years of education positively correlates with higher wages, we could consider that increasing mandatory educational requirements could be valuable.^{1} It is tempting to ignore other sources of heterogeneity, such as the unobservable one when conducting this type of analysis. After all, how could we ever write a policy based on it?
But, after seminal contributions such as Heckman 1979, today we are aware that not accounting for these unobservable differences can lead to misleading conclusions on the role of the observable heterogeneity in explaining outcomes. Economic agents take decisions based on information that is only known to them (and not the econometrician), which drives both the variation in outcomes as well as observable characteristics. To follow up on the education example, highly educated individuals tend to have higher salaries partly because education has positive returns on wages, but also because “smartness” is valued in the labor market. At the same time, smart individuals select themselves into acquiring higher education as it might be less costly for them.
Economic theory often leaves us with little guidance on how to model unobserved heterogeneity. After all, it is unobserved. Panel data, or longitudinal data, where data is collected at the individual level repeatedly over time, offer the opportunity to model unobserved heterogeneity in nonparametric ways. A popular approach is to assume that each individual’s joint distribution of observables is governed by its own set of individual specific parameters. The multiple observations per individual allow learning about these individual –specific parameters provided they do not vary over time. This modeling approach is called fixed effects and it has been studied in depth from a theoretical perspective in a wide range of models.
Fixed effects models, while conceptually appealing, are not good at meeting a desirable premise in statistical modelling: a balance between flexibility and parsimony. Sufficiently flexible models avoid biases due to misspecification, but parsimonious ones (models with few parameters) allow drawing conclusions from the data with precision. Fixed effects, while very flexible, are not parsimonious, as the number of parameters grows with the sample size. Especially, their statistical properties are poor when there are few observations per individual. This poses not only difficulties to learn with precision parameters of interest, but also computational challenges. In fact, fixed effects estimators are not often used in practice outside linear models.
Machine learning algorithms offer attractive alternatives to account for unobserved heterogeneity in complex models. Machine learning encompasses a lot of different tools with different goals, and there is not a consensus of a definition. One that I can agree with is: “Machine Learning […] is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world.” In joint work with Stephane Bonhomme and Thibaut Lamadon, both from the University of Chicago, we use clustering methods to account for unobserved heterogeneity in flexible ways while preserving parsimony. In that sense, we use ML to “parse and learn about unobserved heterogeneity” among agents.
Clustering methods have long been used in a variety of disciplines as a way to summarize variation in the data. Given observations on individuals or firms, a clustering algorithm assigns each individual into a group or cluster according to some similarity measure. The kmeans algorithm, for instance, groups individuals in a way that the within group variance of the data is minimized. We use kmeans to group individuals with similar unobserved heterogeneity by grouping them on the basis of all their observables. In our experience, when heterogeneity among agents is not highdimensional, few clusters are sufficient to approximate the underlying distribution. Clusters substantially reduce the number of parameters to estimate and can alleviate biases and computational hurdles.
We use these methods to decompose the variation in wages in the economy in terms of variation in workers’ unobserved heterogeneity, firms’ unobserved heterogeneity and sorting between workers and firms.^{2} Identifying the contributions of worker and firm heterogeneity to earnings dispersion is an important step towards answering a number of economic questions, such as the nature of sorting patterns between heterogeneous workers and firms or the sources of earnings inequality.
This is a complex setting with twosided unobserved heterogeneity, where the source of identification of the different components of variation in wages relies on comparing wages of workers that have worked at different firms. Unfortunately, in typical matched employer employee datasets, where wages of workers as well as an identifier of their employer are recorded over time, the number of job movers is low, even when pulling multiple years of observations. In fact, it has been long documented that variance decompositions based on twoway fixed effects estimators can suffer from severe “lowmobility bias”.
In order to alleviate these biases we approximate the distribution of firm unobserved heterogeneity using a few clusters. We use a kmeans clustering estimator to classify firms based on how similar their empirical earnings distributions are. According to our model, firms’ earnings distributions reflect both wage premiums and sorting of workers, but as long as two firms have similar distributions their underlying unobserved heterogeneity is similar.^{3} Reducing the number of firm fixed effects alleviates smallsample biases by pooling movers across pairs of firms within the same pair of clusters.
We corroborate this intuition using Swedish matched employer employee data. In simulations that mimic the Swedish data we find that using twoway fixed effects methods lead to substantial bias in sorting measures, such as the correlation between worker and firm fixed effects. Instead, methods based on clustering are able to recover parameters much more precisely.
List of Papers

 Grouped Patterns of Heterogeneity – Econometrica (Vol. 83, No. 3 May 2015, 1147–1184) with S. Bonhomme.

 Distributional Framework for Matched Employer Employee Data – Working paper, 2018, with S. Bonhomme and T. Lamadon.
 Discretizing Unobserved Heterogeneity – Working paper, 2017, with S. Bonhomme and T. Lamadon.
^{1 }In this argument I am not taking into consideration general equilibrium effects.
^{2} I am abstracting here from observables from workers and firms.
^{3 }This identification argument can be modified to account, for instance, for cases in which firms have the same distribution of earnings but different unobserved heterogeneity. In these cases, classification may also be based on mobility patterns or longitudinal earnings information, and it can be modified to incorporate firm characteristics such as value added.