Index > Fundamentals of statistics > Maximum likelihood

Multivariate normal distribution - Maximum Likelihood Estimation

In this lecture we show how to derive the maximum likelihood estimators of the two parameters of a multivariate normal distribution: the mean vector and the covariance matrix.

In order to understand the derivation, you need to be familiar with the concept of trace of a matrix.

Table of contents

Setting
The likelihood function
The log-likelihood function
Preliminaries
The maximum likelihood estimators
Information matrix
Asymptotic variance
References

Setting

Suppose we observe the first terms of an IID sequence of -dimensional multivariate normal random vectors.

The joint probability density function of the -th term of the sequence is where:

$mu _{0}$ is the mean vector;
$V_{0}$ is the covariance matrix.

The covariance matrix $V_{0}$ is assumed to be positive definite, so that its determinant is strictly positive.

We use , that is, the realizations of the first random vectors in the sequence, to estimate the two unknown parameters $mu _{0}$ and $V_{0}$ .

The likelihood function

The likelihood function is [eq5]

Proof

Since the terms in the sequence are independent, their joint density is equal to the product of their marginal densities. As a consequence, the likelihood function can be written as [eq6]

The log-likelihood function

The log-likelihood function is [eq7]

Proof

The log-likelihood is obtained by taking the natural logarithm of the likelihood function: [eq8]

Note that the likelihood function is well-defined only if is strictly positive. This reflects the assumption made above that the true parameter $V_{0}$ is positive definite, which implies that the search for a maximum likelihood estimator of $V_{0}$ is restricted to the space of positive definite matrices.

For convenience, we can also define the log-likelihood in terms of the precision matrix $V^{-1}$ : [eq10] where we have used the property of the determinant [eq11]

Preliminaries

Before deriving the maximum likelihood estimators, we need to state some facts about matrices, their trace and their derivatives:

if is a scalar, then it is equal to its trace:
if two matrices and are such that the products and are both well defined, then
the trace is a linear operator: if and are two matrices and and are two scalars, then
the gradient of the trace of the product of two matrices and with respect to is
the gradient of the natural logarithm of the determinant of is
if is a vector and is a symmetric matrix, then

The maximum likelihood estimators

The maximum likelihood estimators of the mean and the covariance matrix are [eq18]

Proof

We need to solve the following maximization problem The first order conditions for a maximum are [eq20] The gradient of the log-likelihood with respect to the mean vector is [eq21] which is equal to zero only if [eq22] Therefore, the first of the two first-order conditions implies [eq23] The gradient of the log-likelihood with respect to the precision matrix is [eq24] By transposing the whole expression and setting it equal to zero, we get [eq25] Thus, the system of first order conditions is solved by [eq26]

Information matrix

We are now going to give a formula for the information matrix of the multivariate normal distribution, which will be used to derive the asymptotic covariance matrix of the maximum likelihood estimators.

Denote by the column vector of all parameters: [eq28] where converts the matrix into a $K^{2} imes 1$ column vector whose entries are taken from the first column of , then from the second, and so on.

The log-likelihood of one observation from the sample can be written as

The information matrix is [eq31]

Define the vector [eq32]

Thus:

if $heta _{m}$ is an element of , say the -th, then the -th entry of the vector is equal to and all the other entries are equal to ;
if $heta _{m}$ is not an element of , then all the entries of the vector are equal to .

Define the matrix [eq35]

Note that:

if $heta _{m}$ is an element of , say $V_{lk}$ , then the -th entry of the matrix is equal to and all the other entries are equal to ;
if $heta _{m}$ is not an element of , then all the entries of the matrix are equal to .

It can be proved (see, e.g., Pistone and Malagò 2015) that the -th element of the information matrix is [eq38]

Asymptotic variance

The vector [eq39] is asymptotically normal with asymptotic mean equal to [eq40] and asymptotic covariance matrix equal to

In more formal terms, converges in distribution to a multivariate normal distribution with zero mean and covariance matrix .

In other words, the distribution of the vector can be approximated by a multivariate normal distribution with mean $heta _{0}$ and covariance matrix

References

Pistone, G. and Malagò, L. (2015) " Information Geometry of the Gaussian Distribution in View of Stochastic Optimization", Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII, 150-162.

How to cite

Please cite as:

Taboga, Marco (2021). "Multivariate normal distribution - Maximum Likelihood Estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/multivariate-normal-distribution-maximum-likelihood.