Search for probability and statistics terms on Statlect
StatLect

Estimation of the mean

by , PhD

Mean estimation is a statistical inference problem in which a sample is used to produce a point estimate of the mean of an unknown distribution.

The problem is typically solved by using the sample mean as an estimator of the population mean.

In this lecture, we present two examples, concerning:

  1. normal IID samples;

  2. IID samples that are not necessarily normal.

For each of these two cases, we derive the expected value, the variance and the asymptotic properties of the mean estimator.

Table of Contents

Normal IID samples

In this example of mean estimation, which is probably the most important in the history of statistics, the sample is drawn from a normal distribution.

Specifically, we observe the realizations of n independent random variables X_1, ..., X_n, all having a normal distribution with unknown mean mu and variance sigma^2.

The estimator

As an estimator of the mean mu, we use the sample mean[eq1]

Expected value of the estimator

The expected value of the estimator Xbar_n is equal to the true mean mu.

This can be proved by using the linearity of the expected value:[eq2]

Therefore, the estimator Xbar_n is unbiased.

Variance of the estimator

The variance of the estimator Xbar_n is equal to $sigma ^{2}/n$.

This can be proved by using the formula for the variance of an independent sum:[eq3]

Therefore, the variance of the estimator tends to zero as the sample size n tends to infinity.

Distribution of the estimator

The estimator Xbar_n has a normal distribution:[eq4]

Proof

Note that the sample mean Xbar_n is a linear combination of the normal and independent random variables [eq5] (all the coefficients of the linear combination are equal to $frac{1}{n}$). Therefore, Xbar_n is normal because a linear combination of independent normal random variables is normal. The mean and the variance of the distribution have already been derived above.

Risk of the estimator

The mean squared error of the estimator is[eq6]

Consistency of the estimator

The sequence [eq7] is an IID sequence with finite mean.

Therefore, it satisfies the conditions of Kolmogorov's Strong Law of Large Numbers.

Hence, the sample mean Xbar_n converges almost surely to the true mean mu:[eq8]that is, the estimator Xbar_n is strongly consistent.

The estimator is also weakly consistent because almost sure convergence implies convergence in probability:[eq9]

IID samples

In this example of mean estimation, we relax the previously made assumption of normality.

The sample is made of the realizations of n independent random variables X_1, ..., X_n, all having the same distribution with mean mu and variance sigma^2.

The estimator

Again, the estimator of the mean mu is the sample mean:[eq10]

Expected value of the estimator

The expected value of the estimator Xbar_n is equal to the true mean:[eq11]

Therefore, the estimator is unbiased.

The proof is the same found in the previous example.

Variance of the estimator

The variance of the estimator Xbar_n is[eq12]

Also in this case the proof is the same found in the previous example.

Distribution of the estimator

Unlike in the previous example, the estimator Xbar_n does not necessarily have a normal distribution: its distribution depends on those of the terms of the sequence [eq7].

However, we will see below that Xbar_n has a normal distribution asymptotically, that is, it converges to a normal random variable when n becomes large.

Risk of the estimator

The mean squared error of the estimator is[eq14]

The proof is the same found in the previous example.

Consistency of the estimator

Since the sequence [eq7] is an IID sequence whose terms have finite mean, it satisfies the conditions of Kolmogorov's Strong Law of Large Numbers.

Therefore, the estimator Xbar_n is both strongly consistent and weakly consistent (see example above).

Asymptotic normality

The sequence [eq7] is an IID sequence with finite mean and variance.

Therefore, it satisfies the conditions of Lindeberg-Lévy Central Limit Theorem.

Hence, the sample mean Xbar_n is asymptotically normal: [eq17]where Z is a standard normal random variable and [eq18] denotes convergence in distribution.

In other words, the sample mean Xbar_n converges in distribution to a normal random variable with mean mu and variance [eq19].

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Consider an experiment that can have only two outcomes: either success, with probability p, or failure, with probability $1-p$.

The probability of success is unknown, but we know that[eq20]

Suppose that we can independently repeat the experiment as many times as we wish and use the ratio[eq21]as an estimator of p.

What is the minimum number of experiments needed in order to be sure that the standard deviation of the estimator is less than 1/100?

Solution

Denote by $widehat{p}$ the estimator of p. It can be written as[eq22]where n is the number of repetitions of the experiment and [eq23] are n independent random variables having a Bernoulli distribution with parameter p. Therefore, $widehat{p}$ is the sample mean of n independent Bernoulli random variables with expected value p and[eq24]Thus[eq25]We need to ensure that[eq26]or[eq27]which is certainly verified if[eq28]or[eq29]

Exercise 2

Suppose that you observe a sample of 100 independent draws from a distribution having unknown mean mu and known variance $sigma ^{2}=1$.

How can you approximate the distribution of their sample mean?

Solution

We can approximate the distribution of the sample mean with its asymptotic distribution. So, the distribution of the sample mean can be approximated by a normal distribution with mean mu and variance [eq30]

How to cite

Please cite as:

Taboga, Marco (2021). "Estimation of the mean", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/mean-estimation.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.