Lars Peter Hansen explained. Kind of.
The entire econo-blogosphere has its usual pieces up explaining the work of two of this year’s Nobel(ish) laureates in economics, Gene Fama and Bob Shiller. Most of them just handwave when it comes to Lars Peter Hansen’s contributions.
Disclaimer: Skip this post if you already know all about GMM and can spout out the Hansen (1982) results without even looking. Also, I am not an econometrician and I learned this stuff years ago. Substantive corrections and amplifications are very welcome in the comments. I will try to not to assume that the reader knows everything in finance and econometrics except GMM. I will fail.
(Haters: Yes, yes, I’m sure Noah would never do an entire post just on the incredibly banal concept of GMM. Too bad, he is away this fall. Except of course when he’s not. Also to haters: My putting out crappy posts is an incentive for him to come back sooner.)
Generalized Method of Moments, or GMM, is a method for estimating statistical models. It allows you to write down models, estimate parameters, and test hypotheses (restrictions) on those models. It can also provide an overarching framework for econometrics. Fumio Hayashi’s textbook, which old timers shake their head at, uses GMM as the organizing framework of his treatment, and derives many classical results as special cases of GMM.
Since the claim is that GMM is particularly useful to financial data, let’s motivate with a financial model based on Hansen and Singleton (1983). It may seem preposterously unrealistic, but this is what asset pricing folks do, and you can always consider it a starting point for better models, as we do with Modigliani-Miller. Suppose the economy consists of a single, infinitely-lived representative consumer, whose von Neumann-Morgenstern utility function exhibits constant relative risk aversion, \begin{equation} U(c_t) = \frac{c_t^\gamma} \gamma, \quad \gamma < 1, \end{equation} where \(c_t\) is consumption in period t and \(\gamma\) is the coefficient of relative risk aversion. She maximizes her expected utility \begin{equation} E_0 \left[ \sum_{t=0}^\infty \beta^t U(c_t) \right], \quad 0 < \beta < 1, \end{equation} where \(E_0\) means expectation with information available at the beginning of the problem and \(\beta\) is a discount factor to represent pure time preference. This utility function implies that the representative consumer prefers more consumption, other things being equal, with more weight on consumption that happens sooner rather than later, but also wants to avoid a future consumption path that is too risky. She will invest in risky, assets, but exactly how risky?
If we introduce multiple assets, then try to solve this model by differentiating the expected utility function and setting it equal to zero, we get a first order condition \begin{equation} \label{returns-moment} E_t \left[ \beta \left( \frac{c_{t+1}}{c_t} \right) ^\alpha r_{i,t+1} \right] = 1, \quad i = 1, \ldots, N, \end{equation} where \( \alpha \equiv \gamma - 1 \) and \(r_{i,t+1}\) is the return on asset i from time t to time t+1. This approach is the basis of the entire edifice of “consumption based asset pricing” and it provides a theory for asset returns: they should be related to consumption growth, and in particular, assets that are highly correlated with consumption growth (have a high “consumption beta”) should have higher returns because they provide less insurance against consumption risk.
Equation \( \eqref{returns-moment} \) contains some variables such as \( c_t \) and \( r_{i,t+1} \) that we should hopefully be able to read in the data. It also contains parameters \( \beta \) and \( \alpha \) (or, of you prefer, \( \gamma \)), that we would like to estimate, and then judge whether the estimates are realistic. We would also like to test whether \( \eqref{returns-moment} \) provides a good description of the consumption and returns data, or in other words, whether this is a good model.
The traditional organizing method of statistics is maximum likelihood. To apply it to our model, we would have to add an error term \( \varepsilon_t \) that represents noise and unobserved variables, specify a full probability distribution for it, and then find parameters \( \beta \) and \( \alpha \) that maximizes the likelihood (which is kind of like a probability) that the model generates the data we actually have. We then have several ways to test the hypothesis that this model describes the data well.
The problem with maximum likelihood methods is that we have to specify a full probability distribution for the data. It’s common to assume a normal distribution for \( \varepsilon_t \). Sometimes you can assume normality without actually imposing too many restrictions on the model, but some people always like to complain whenever normal distributions are brought up.
Hansen’s insight, based on earlier work, was that we could write down the sample analog of \( \eqref{returns-moment} \), \begin{equation} \label{sample-analog} \frac 1 T \sum_{t=1}^T \beta \left( \frac{ c_{t+1}}{c_t} \right)^\alpha r_{i,t+1} = 1, \end{equation} where instead of an abstract expected value we have an actual sample mean. Equation \( \eqref{sample-analog} \) can be filled in with observed values of consumption growth and stock returns, and then solved for \( \beta \) and \( \alpha \). Hansen discovered the exact assumptions for when this is valid statistically. He also derived asymptotic properties of the resulting estimators and showed how to test restrictions on the model, so we can test whether the restriction represented by \( \eqref{returns-moment} \) is supported by the data.
One big puzzle in consumption based asset pricing is that consumption is much smoother than stock returns than is predicted by the theory (I haven’t derived that, but manipulate \( \eqref{returns-moment} \) a little and you will see it); one of my favorite papers in this literature uses garbage as a proxy for consumption.
How does GMM relate to other methods? It turns out that you can view maximum likelihood estimation as a special case of GMM. Maximum likelihood estimation involves maximizing the likelihood function (hence the name), which implies taking a derivative and setting the derivative (called the score function in this world) equal to zero. Well, that’s just GMM with a moment condition saying the score function is equal to zero. Similarly, Hayashi lays out how various other classical methods in econometrics such as OLS, 2SLS and SUR can be viewed as special cases of GMM.
People who are not expert theoretical econometricians often have to derive their own estimators for some new-fangled model they have come up with. In many contexts it is simply more natural, and easier, to use moment conditions as a starting point than to try to specify the entire (parameterized) probability distribution of errors.
One paper that I find quite neat is Richardson and Smith (1993), who propose a multivariate normality test based on GMM. For stock returns, skewness and excess kurtosis are particularly relevant, and normality implies that they are both zero. Since skewness and excess kurtosis are moments, it is natural to specify as moment conditions that they are zero, estimate using GMM, and then use the J-test to see if the moment conditions hold.
PS. Noah will tell me I am racist for getting my Japanese names confused. I was going to add that in addition to econometrics, Hayashi is also known for his work on the economy of Ancient Greece. That’s actually Takeshi Amemiya, whose Advanced Econometrics is a good overview of the field as it stood right before the “GMM revolution”.