Search for probability and statistics terms on Statlect
StatLect
Index > Asymptotic theory

Central Limit Theorem

by , PhD

Central Limit Theorems (CLT) state conditions that are sufficient to guarantee the convergence of the sample mean to a normal distribution as the sample size increases.

Table of Contents

Sample mean

As Central Limit Theorems concern the sample mean, we first define it precisely.

Let [eq1] be a sequence of random variables.

We will denote by Xbar_n the sample mean of the first n terms of the sequence:[eq2]

When the sample size n increases, we add more observations X_i to the sample mean.

Note that the sample mean, being a sum of random variables, is itself a random variable.

Convergence to the normal distribution

The Central Limit Theorem tells us what happens to the distribution of the sample mean when we increase the sample size.

Remember that if the conditions of a Law of Large Numbers apply, the sample mean converges in probability to the expected value of the observations, that is,[eq3]

In a Central Limit Theorem, we first standardize the sample mean, that is, we subtract from it its expected value and we divide it by its standard deviation. Then, we analyze the behavior of its distribution as the sample size gets large. What happens is that the standardized sample mean converges in distribution to a normal distribution:[eq4] where Z is a standard normal random variable.

In the important case in which the variables X_i are independently and identically distributed (IID), the formula above becomes[eq5]because[eq6]and[eq7]

Intuition

Several students are confused by the fact that the sample mean converges to a constant in the Law of Large Numbers, while it converges to a normal distribution in the Central Limit Theorem. This seems a contradiction: a normal distribution is not a constant!

The formula for the IID case may help to eliminate this kind of doubt: in the Law of Large Numbers, the variance of the sample mean converges to zero, while in the Central Limit Theorem the sample mean is multiplied by $sqrt{n}$ so that its variance stays constant.

How the Central Limit Theorem is used in practice

In practice, the CLT is used as follows:

  1. we observe a sample consisting of n observations X_1, X_2, $ldots $, X_n;

  2. if n is large enough, then a standard normal distribution is a good approximation of the distribution of the standardized sample mean;

  3. therefore, we pretend that[eq8]where [eq9] indicates the normal distribution with mean 0 and variance 1;

  4. as a consequence, the distribution of the sample mean $overline{X}_{n} $ is[eq10]

Examples

There are several Central Limit Theorems. We report some examples below.

Lindeberg-Lévy Central Limit Theorem

The best known Central Limit Theorem is probably Lindeberg-Lévy CLT:

Proposition (Lindeberg-Lévy CLT) Let [eq1] be an IID sequence of random variables such that:[eq12]where $sigma ^{2}>0$. Then, a Central Limit Theorem applies to the sample mean Xbar_n:[eq13]where Z is a standard normal random variable and [eq14] denotes convergence in distribution.

Proof

We will just sketch a proof. For a detailed and rigorous proof see, for example: Resnick (1999) and Williams (1991). First of all, denote by [eq15] the sequence whose generic term is[eq16]The characteristic function of $Z_{n}$ is[eq17]Now take a second order Taylor series expansion of [eq18] around the point $s=0$:[eq19]where [eq20] is an infinitesimal of higher order than $s^{2} $, that is, a quantity that converges to 0 faster than $s^{2}$ does. Therefore,[eq21]So, we have that[eq22]where [eq23]is the characteristic function of a standard normal random variable Z (see the lecture entitled Normal distribution). A theorem, called Lévy continuity theorem, which we do not cover in these lectures, states that if a sequence of random variables [eq24] is such that their characteristic functions [eq25] converge to the characteristic function [eq26] of a random variable Z, then the sequence [eq27] converges in distribution to Z. Therefore, in our case the sequence [eq15] converges in distribution to a standard normal distribution.

So, roughly speaking, under the stated assumptions, the distribution of the sample mean Xbar_n can be approximated by a normal distribution with mean mu and variance [eq29] (provided n is large enough).

Also note that the conditions for the validity of Lindeberg-Lévy Central Limit Theorem resemble the conditions for the validity of Kolmogorov's Strong Law of Large Numbers. The only difference is the additional requirement that[eq30]

The Central Limit Theorem for correlated sequences

In the Lindeberg-Lévy CLT (see above), the sequence [eq31] is required to be an IID sequence. The assumption of independence can be weakened as follows.

Proposition (CLT for correlated sequences) Let [eq1] be a stationary and mixing sequence of random variables satisfying a CLT technical condition (defined in the proof below) and such that[eq33]where $V>0$. Then, a Central Limit Theorem applies to the sample mean Xbar_n:[eq34]where Z is a standard normal random variable and [eq14] indicates convergence in distribution.

Proof

Several different technical conditions (beyond those explicitly stated in the above proposition) are imposed in the literature in order to derive Central Limit Theorems for correlated sequences. These conditions are usually very mild and differ from author to author. We do not mention these technical conditions here and just refer to them as CLT technical conditions.

For a proof, see for example Durrett (2010) and White (2001).

So, roughly speaking, under the stated assumptions, the distribution of the sample mean Xbar_n can be approximated by a normal distribution with mean mu and variance $frac{V}{n}$ (provided n is large enough).

Also note that the conditions for the validity of the Central Limit Theorem for correlated sequences resemble the conditions for the validity of the ergodic theorem. The main differences (beyond some technical conditions that are not explicitly stated in the above proposition) are the additional requirements that:[eq36]and the fact that ergodicity is replaced by the stronger condition of mixing.

Finally, let us mention that the variance V in the above proposition, which is defined as[eq37]is called the long-run variance of Xbar_n.

Multivariate generalizations

The results illustrated above for sequences of random variables extend in a straightforward manner to sequences of random vectors. For example, the multivariate version of the Lindeberg-Lévy CLT is as follows.

Proposition (Multivariate Lindeberg-Lévy CLT) Let [eq1] be an IID sequence of Kx1 random vectors such that[eq39]where [eq40] for an invertible matrix Sigma. Let [eq41] be the vector of sample means. Then,[eq42]where Z is a standard multivariate normal random vector and [eq43] denotes convergence in distribution.

Proof

For a proof see, for example, Basu (2004), DasGupta (2008) and McCabe and Tremayne (1993).

In a similar manner, the CLT for correlated sequences generalizes to random vectors (V becomes a matrix, called long-run covariance matrix).

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Let [eq1] be a sequence of independent Bernoulli random variables with parameter $frac{1}{2}$, i.e. a generic term X_n of the sequence has support[eq45]and probability mass function[eq46]

Use a Central Limit Theorem to derive an approximate distribution for the mean of the first $100$ terms of the sequence.

Solution

The sequence [eq1] is and IID sequence. The mean of a generic term of the sequence is[eq48]The variance of a generic term of the sequence can be derived thanks to the usual formula for computing the variance ([eq49]):[eq50]Therefore, the sequence [eq1] satisfies the conditions of Lindeberg-Lévy Central Limit Theorem (IID, finite mean, finite variance). The mean of the first $100$ terms of the sequence is[eq52]Using the Central Limit Theorem to approximate its distribution, we obtain[eq53]or[eq54]

Exercise 2

Let [eq1] be a sequence of independent Bernoulli random variables with parameter $frac{1}{2}$, as in the previous exercise. Let [eq56] be another sequence of random variables such that[eq57]

Suppose [eq58] satisfies the conditions of a Central Limit Theorem for correlated sequences. Derive an approximate distribution for the mean of the first n terms of the sequence [eq58].

Solution

The sequence [eq1] is and IID sequence. The mean of a generic term of the sequence is[eq61]The variance of a generic term of the sequence is[eq62]The covariance between two successive terms of the sequence is[eq63]The covariance between two terms that are not adjacent ($Y_{n}$ and $Y_{n+j}$, with $j>1$) is[eq64]The long-run variance is[eq65]The mean of the first n terms of the sequence [eq58] is[eq67]Using the Central Limit Theorem for correlated sequences to approximate its distribution, we obtain:[eq68]or[eq69]

Exercise 3

Let Y be a binomial random variable with parameters $n=100$ and $frac{1}{2}$ (you need to read the lecture entitled Binomial distribution in order to be able to solve this exercise). By using the Central Limit Theorem, show that a normal random variable X with mean $mu =50$ and variance $sigma ^{2}=25$ can be used as an approximation of Y.

Solution

A binomial random variable Y with parameters $n=100$ and $frac{1}{2}$ can be written as[eq70]where X_1, ..., $X_{100}$ are mutually independent Bernoulli random variables with parameter $frac{1}{2}$. Thus,[eq71]In the first exercise, we have shown that the distribution of $overline{X}_{100}$ can be approximated by a normal distribution:[eq54]Therefore, the distribution of Y can be approximated by[eq73]Thus, Y can be approximated by a normal distribution with mean $mu =50$ and variance $sigma ^{2}=25$.

References

Basu, A. K. (2004) Measure theory and probability, PHI Learning PVT.

DasGupta, A. (2008) Asymptotic theory of statistics and probability, Springer.

Durrett, R. (2010) Probability: theory and examples, Cambridge University Press.

McCabe, B. and A. Tremayne (1993) Elements of modern asymptotic theory with statistical applications, Manchester University Press.

Resnick, S. I. (1999) A probability path, Birkhauser.

White, H. (2001) Asymptotic theory for econometricians, Academic Press.

Williams, D. (1991) Probability with martingales, Cambridge University Press.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.