Search for probability and statistics terms on Statlect
StatLect

Covariance matrix of the maximum likelihood estimator

by , PhD

The asymptotic covariance matrix of a maximum likelihood estimator (MLE) is an unknown quantity that we need to approximate when we want to build confidence intervals around the point estimates obtained with the maximum likelihood method.

It is not to be confused with the MLE of the covariance matrix of a distribution (check this page).

This lecture presents three popular estimators that can be used to approximate the asymptotic covariance matrix of the MLE:

  1. the outer product of gradients (OPG) estimator;

  2. the Hessian estimator;

  3. the Sandwich estimator.

Table of Contents

The setting

Let [eq1] be the realizations of the first n terms of an IID sequence [eq2].

Suppose that a generic term of the sequence has probability density (or mass) function [eq3] where $	heta _{0}$ is an unknown vector of parameters.

The maximum likelihood estimator of $	heta _{0}$ is [eq4]

Asymptotic covariance matrix

As proved in the lecture on maximum likelihood, under certain technical assumptions the distribution of [eq5] is asymptotically normal.

In particular, the distribution of [eq6] can be approximated by a multivariate normal distribution with mean $	heta _{0}$ and covariance matrix[eq7]where:

Note that we divide by n because the asymptotic covariance matrix is the covariance matrix of [eq11], while we are interested in the covariance of [eq12].

Information equality

Under the technical assumptions mentioned previously, the information equality holds:[eq13]where the Hessian matrix [eq14] is the matrix of second-order partial derivatives of the log-likelihood function.

Outer product of gradients (OPG) estimator

The first estimator of the asymptotic covariance matrix[eq15]is called outer product of gradients (OPG) estimator and it is computed as[eq16]

It takes its name from the fact that the gradient [eq17] is a column vector, its transpose is a row vector, and the product between a column and a row is called outer product.

Provided some regularity conditions are satisfied, the OPG estimator $widehat{V}_{n}$ is a consistent estimator of V, that is, it converges in probability to V.

Proof

We provide only a sketch of the proof and we refer the reader to Newey and McFadden (1994) for a more rigorous exposition. Provided some regularity conditions are satisfied (see the source just cited), we have the following equality between probability limits:[eq18]where [eq19] has been replaced by $	heta _{0}$ because, being a consistent estimator, it converges in probability to $	heta _{0}$. Because the sample is IID, by the Law of Large Numbers we have that[eq20]Now, the formula for the covariance matrix (see the lecture entitled Covariance matrix) yields[eq21]But the expected value of the gradient evaluated at $	heta _{0}$ is 0, so that[eq22]Thus,[eq23]Because matrix inversion is continuous, by the Continuous Mapping theorem we have [eq24]which is exactly the result we needed to prove.

Hessian estimator

The second estimator of the asymptotic covariance matrix[eq25]is called Hessian estimator and it is computed as[eq26]

Under some regularity conditions, the Hessian estimator $widetilde{V}_{n}$ is also a consistent estimator of V.

Proof

Again, we do not provide an entirely rigorous proof (for which you can see Newey and McFadden - 1994) and we only a sketch the main steps. First of all, under some regularity conditions, we have that[eq27]where [eq28] has been replaced by $	heta _{0}$ because, being a consistent estimator, it converges in probability to $	heta _{0}$. Now, since the sample is IID, by the Law of Large Numbers we have that[eq29]By the information equality, we have[eq30]Therefore,[eq31]Because matrix inversion is continuous, by the Continuous Mapping theorem we have [eq32]which is what we needed to prove.

Sandwich estimator

The third estimator of the asymptotic covariance matrix[eq33]is called Sandwich estimator and it is computed as[eq34]where $widehat{V}_{n}$ is the OPG estimator and $widetilde{V}_{n}$ is the Hessian estimator.

Also the Sandwich estimator [eq35] is a consistent estimator of V.

Proof

This is again a consequence of the Continuous Mapping theorem:[eq36]where the last equality follows from the consistency of the OPG and Hessian estimators.

References

Newey, W. K. and D. McFadden (1994) "Chapter 35: Large sample estimation and hypothesis testing", in Handbook of Econometrics, Elsevier.

How to cite

Please cite as:

Taboga, Marco (2021). "Covariance matrix of the maximum likelihood estimator", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/maximum-likelihood-covariance-matrix-estimation.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.