Search for probability and statistics terms on Statlect
Index > Fundamentals of statistics

Point estimation

by , PhD

In the lecture entitled Statistical inference we have defined statistical inference as the act of using a sample to make statements about the probability distribution that generated the sample. The sample $xi $ is regarded as the realization of a random vector $Xi $, whose unknown joint distribution function, denoted by [eq1], is assumed to belong to a set of distribution functions $Phi $, called statistical model (or, simply, model).

When the model $Phi $ is put into correspondence with a set [eq2] of real vectors, then we have a parametric model. $Theta $ is called the parameter space and its elements are called parameters. Denote by $	heta _{0}$ the parameter that is associated with the unknown distribution function [eq3] and assume that $	heta _{0}$ is unique. $	heta _{0}$ is called the true parameter, because it is associated to the distribution that actually generated the sample. This lecture introduces a type of inference about the true parameter called point estimation.

Table of Contents

Estimate and estimator

Roughly speaking, point estimation is the act of choosing a parameter [eq4] which is our best guess of of the true (and unknown) parameter $	heta _{0}$. Our best guess $widehat{	heta }$ is called an estimate of $	heta _{0}$.

When the estimate $widehat{	heta }$ is produced using a predefined rule (a function) that associates a parameter estimate $widehat{	heta }$ to each $xi $ in the support of $Xi $, we can write[eq5]

The function [eq6] is called an estimator. Often, the symbol $widehat{	heta }$ is used to denote both the estimate and the estimator. The meaning is usually clear from the context.

Estimation error, loss and risk

Using the decision-theoretic terminology introduced in the lecture entitled Statistical inference, making an estimate $widehat{	heta }$ is an act, which produces consequences. Among the consequences that are usually considered in a parametric decision problem, the most relevant one is the estimation error.

The estimation error $e$ is the difference between the estimate $widehat{	heta }$ and the true parameter $	heta _{0}$:[eq7]

Of course, the statistician's goal is to commit the smallest possible estimation error. This preference can be formalized using loss functions. A loss function [eq8], mapping [eq9] into R, quantifies the loss incurred by estimating $	heta _{0}$ with $widehat{	heta }$.

Frequently used loss functions are:

  1. the absolute error:[eq10]where [eq11] is the Euclidean norm (it coincides with the absolute value when [eq12]);

  2. the squared error:[eq13]

When the estimate $widehat{	heta }$ is obtained from an estimator (a function of the sample $xi $, which in turn is a realization of the random vector $Xi $), then the loss [eq14]can be thought of as a random variable. Its expected value is called the statistical risk (or, simply, the risk) of an estimator $widehat{	heta }$ and it is denoted by [eq15]:[eq16]Note that the expected value in the above definition of risk is computed with respect to the true distribution function [eq17]. Therefore, in order to compute the risk [eq18], we need to know not only the true parameter $	heta _{0}$, but also the distribution function of $Xi $ (i.e. [eq17]). In practice, the risk in never known, because $	heta _{0}$ and [eq20] are unknown, so also the risk needs to be estimated. For example, we can compute an estimate $widehat{R}$ of the risk by pretending that the estimate $widehat{	heta }$ were the true parameter, denoting by [eq21] the estimator of $widehat{	heta }$ and computing the estimated risk as[eq22]where the expected value is with respect to the estimated distribution function [eq23].

Even if the risk is unknown, the notion of risk is often used to derive theoretical properties of estimators. In any case, parameter estimation is always guided, at least ideally, by the principle of risk minimization, i.e. by the search for estimators $widehat{	heta }$ that minimize the risk [eq24].

Depending on the specific loss function we use, the statistical risk of an estimator can take different names:

  1. when the absolute error is used as a loss function, then the risk[eq25]is called the mean absolute error of the estimator.

  2. when the squared error is used as a loss function, then the risk[eq26]is called mean squared error (MSE). The square root of the mean squared error is called root mean squared error (RMSE).

Other criteria to evaluate estimators

In this section we discuss other criteria that are commonly used to evaluate estimators.


If an estimator produces parameter estimates that are on average correct, then it is said to be unbiased. The following is a formal definition:

Definition Let $	heta _{0}$ be the true parameter and let $widehat{	heta }$ be an estimator of $	heta _{0}$. $widehat{	heta }$ is an unbiased estimator of $	heta _{0}$ if and only if[eq27]If an estimator is not unbiased, then it is called a biased estimator.

Note that in the above definition of unbiasedness [eq28] is a shorthand for [eq29]where $Xi $ is the random vector of which the sample $xi $ is a realization and the expected value is computed with respect to the true distribution function [eq17].

Also note that if an estimator is unbiased, this implies that the estimation error is on average zero:[eq31]


If an estimator produces parameter estimates that converge to the true value when the sample size increases, then it is said to be consistent. The following is a formal definition.

Definition Let [eq32] be a sequence of samples such that all the distribution functions [eq33] are put into correspondence with the same parameter $	heta _{0}$. A sequence of estimators [eq34] is said to be consistent (or weakly consistent) if and only if[eq35]where $QTR{rm}{plim}$ indicates convergence in probability. The sequence of estimators is said to be strongly consistent if and only if[eq36]where [eq37] indicates almost sure convergence. A sequence of estimators which is not consistent is called inconsistent.

When the sequence of estimators is obtained using the same predefined rule for every sample $xi _{n}$, we often say, with a slight abuse of language, "consistent estimator" instead of saying "consistent sequence of estimators". In such cases, what we mean is that the predefined rule produces a consistent sequence of estimators.


You can find examples of point estimation in the lectures entitled Point estimation of the mean and Point estimation of the variance.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.