In the lecture entitled Statistical
inference we have defined statistical inference as the act of using a
sample
to make statements about the probability distribution that generated the
sample. The sample
is regarded as the realization of a random vector
,
whose unknown joint distribution function,
denoted by
,
is assumed to belong to a set of distribution functions
,
called statistical model (or, simply, model).
When the model
is put into correspondence with a set
of real vectors, then we have a
parametric model.
is called the parameter
space and its elements are called
parameters. Denote
by
the parameter that is associated with the unknown distribution function
and assume that
is unique.
is called the true parameter, because it is associated to the
distribution that actually generated the sample. This lecture introduces a
type of inference about the true parameter called point estimation.
Roughly speaking, point estimation is the act of choosing a parameter
which is our best guess of of the true (and unknown) parameter
.
Our best guess
is called an estimate of
.
When the estimate
is produced using a predefined rule (a function) that associates a parameter
estimate
to each
in the support of
,
we can
write:
The function
is called an estimator. Often, the symbol
is used to denote both the estimate and the estimator. The meaning is usually
clear from the context.
Using the decision-theoretic terminology introduced in the lecture entitled
Statistical inference, making an
estimate
is an act, which
produces
consequences. Among
the consequences that are usually considered in a parametric decision problem,
the most relevant one is the estimation error. The estimation
error
is the difference between the estimate
and the true parameter
:
Of course, the statistician's goal is to commit the smallest possible
estimation error. This
preference can be
formalized using loss functions. A loss function
,
mapping
into
,
quantifies the loss incurred by estimating
with
.
Frequently used loss functions are:
The absolute
error:
where
is the Euclidean norm (it coincides with the absolute value when
).
The squared
error:
When the estimate
is obtained from an estimator (a function of the sample
,
which in turn is a realization of the random vector
),
then the loss:
can be thought of as a random variable. Its expected value is called the
statistical risk (or, simply, the risk) of
an estimator
and it is denoted by
:
Note
that the expected value in the above definition of risk is computed with
respect to the true distribution function
.
Therefore, in order to compute the risk
,
we need to know not only the true parameter
,
but also the distribution function of
(i.e.
).
In practice, the risk in never known, because
and
are unknown, so also the risk needs to be estimated. For example, we can
compute an estimate
of the risk by pretending that the estimate
were the true parameter, denoting by
the estimator of
and computing the estimated risk
as:
where
the expected value is with respect to the estimated distribution function
.
Even if the risk is unknown, the notion of risk is often used to derive
theoretical properties of estimators. In any case, parameter estimation is
always guided, at least ideally, by the principle of risk minimization, i.e.
by the search for estimators
that minimize the risk
.
Depending on the specific loss function we use, the statistical risk of an estimator can take different names:
when the absolute error is used as a loss function, then the
risk
is
called the mean absolute error of the estimator.
when the squared error is used as a loss function, then the
risk
is
called mean squared error (MSE). The square
root of the mean squared error is called root mean squared
error (RMSE).
In this section we discuss other criteria that are commonly used to evaluate estimators.
If an estimator produces parameter estimates that are on average correct, then it is said to be unbiased. The following is a formal definition:
Definition (unbiasedness)_
Let
be the true parameter and let
be an estimator of
.
is an unbiased estimator of
if and only
if:
If
an estimator is not unbiased, then it is called a biased
estimator.
Note that in the above definition of unbiasedness
is a shorthand for:
where
is the random vector of which the sample
is a realization and the expected value is computed with respect to the true
distribution function
.
Also note that if an estimator is unbiased, this implies that the estimation
error is on average
zero:
If an estimator produces parameter estimates that converge to the true value when the sample size increases, then it is said to be consistent. The following is a formal definition:
Definition (consistency)_
Let
be a sequence of samples such that all the distribution functions
are put into correspondence with the same parameter
.
A sequence of estimators
is said to be consistent (or weakly consistent) if and only
if:
where
indicates convergence in probability. The sequence
of estimators is said to be strongly consistent if and only
if:
where
indicates almost sure convergence. A sequence of
estimators which is not consistent is called inconsistent.
When the sequence of estimators is obtained using the same predefined rule for
every sample
,
we often say, with a slight abuse of language, "consistent estimator" instead
of saying "consistent sequence of estimators". In such cases, what we mean is
that the predefined rule produces a consistent sequence of estimators.
You can find examples of point estimation in the lectures entitled Point estimation of the mean and Point estimation of the variance.