StatlectThe Digital Textbook
Index > Fundamentals of statistics

Statistical inference

Statistical inference is the act of using observed data to infer unknown properties and characteristics of the probability distribution from which the observed data have been generated. The set of data that is used to make inferences is called sample.

Samples

In the simplest possible case, we observe the realizations $x_{1}$, ..., $x_{n}$ of n independent random variables X_1, ..., X_n having a common distribution function [eq1] and we use the observed realizations to infer some characteristics of [eq2]. With a slight abuse of language, we sometimes say "n independent realizations of a random variable X" instead of saying "the realizations of n independent random variables X_1, ..., X_n having a common distribution function [eq3]".

Example The lifetime of a certain type of electronic device is a random variable X, whose distribution function [eq2] is unknown. Suppose we independently observe the lifetimes of $10$ components. Denote these realizations by $x_{1}$, $x_{2}$, ..., $x_{10}$. We are interested in the expected value of X, which is an unknown characteristic of [eq5]. We infer [eq6] from the data, estimating [eq6] with the sample mean:[eq8]In this simple example the observed data $x_{1}$, $x_{2}$, ..., $x_{10}$ constitute our sample and [eq6] is the quantity about which we are making a statistical inference.

While in the simplest case X_1, ..., X_n are independent random variables, more complicated cases are possible. For example:

  1. X_1, ..., X_n are not independent;

  2. X_1, ..., X_n are random vectors having a common joint distribution function [eq2];

  3. X_1, ..., X_n do not have a common probability distribution.

Is there a definition of sample that generalizes all of the above special cases? Fortunately, there is one and it is extremely simple:

Definition A sample $xi $ is the realization of a random vector $Xi $.

The distribution function of $Xi $, denoted by [eq11], is the unknown distribution function that constitutes the object of inference.

Therefore, 'sample' is just a synonym of 'realization of a random vector'. The following examples show how this general definition accommodates the special cases mentioned above:

Example When we observe n realizations $x_{1}$, ..., $x_{n}$ of n independent random variables X_1, ..., X_n having a common distribution function [eq2], the sample is the n-dimensional vector [eq13], which is a realization of the random vector [eq14]. The joint distribution function of $Xi $ is:[eq15]

Example When we observe n realizations $x_{1}$, ..., $x_{n}$ of n random variables X_1, ..., X_n that are not independent but have a common distribution function [eq2], the sample is again the n-dimensional vector [eq17], which is a realization of the random vector [eq18]. However, in this case the joint distribution function [eq19] can no longer be written as the product of the distribution functions of X_1, ..., X_n.

Example When we observe n realizations $x_{1}$, ..., $x_{n}$ of n independent K-dimensional random vectors X_1, ..., X_n having a common joint distribution function [eq2], the sample is the $nK$-dimensional vector [eq21], which is a realization of the random vector [eq22]. The joint distribution function of $Xi $ is:[eq15]

Example When we observe n realizations $x_{1}$, ..., $x_{n}$ of n independent K-dimensional random vectors X_1, ..., X_n having different joint distribution functions [eq24], ..., [eq25], the sample is the $nK$-dimensional vector [eq26], which is a realization of the random vector [eq14]. The joint distribution function of $Xi $ is:[eq28]

When the sample is made of n realizations $x_{1}$, ..., $x_{n}$ of n random variables (or random vectors):[eq29]then we say that the sample has size n (or that the sample size is n). An individual realization $x_{i}$ is referred to as an observation from the sample.

Statistical models

In the previous section we have defined a sample $xi $ as a realization of a random vector $Xi $ having joint distribution function [eq30]. The sample $xi $ is used to infer some characteristics of [eq31] that are not fully known by the statistician. The properties and the characteristics of [eq19] that are already known (or are assumed to be known) before observing the sample are called a model for $Xi $. In mathematical terms, a model for $Xi $ is a set of joint distribution functions to which [eq19] is assumed to belong:

Definition Let the sample $xi $ be a realization of a $l$-dimensional random vector $Xi $ having joint distribution function [eq19]. Let $Psi $ be the set of all $l$-dimensional joint distribution functions:[eq35]A subset [eq36] is called a statistical model (or a model specification or, simply, a model) for $Xi $. If $F_{Xi }in Phi $ the model is said to be correctly specified (or well-specified). Otherwise, if [eq37] the model is said to be mis-specified.

Continuing the examples of the previous section:

Example Suppose our sample is made of n realizations $x_{1}$, ..., $x_{n}$ of n random variables X_1, ..., X_n. Assume that the n random variables are mutually independent and that they have a common distribution function [eq38] The sample is the n-dimensional vector [eq39]. $Psi $ is the set of all possible distribution functions of the random vector [eq22]. Recalling the definition of marginal distribution function and the characterization of mutual independence, the statistical model $Phi $ is defined as follows:[eq41]

Example Take the example above and drop the assumption that the n random variables X_1, ..., X_n are mutually independent. The statistical model $Phi $ is now:[eq42]

The next subsections introduce some terminology related to model specification.

Parametric models

A model $Phi $ for $Xi $ is called a parametric model if the joint distribution functions belonging to $Phi $ are put into correspondence with a set $Theta $ of real vectors:

Definition Let $Phi $ be a model for $Xi $. Let [eq43] be a set of p-dimensional real vectors. Let [eq44] be a correspondence that associates a subset of $Phi $ to each $	heta in Theta $. The triple [eq45] is a parametric model if and only if:[eq46]The set $Theta $ is called parameter space. A vector $	heta in Theta $ is called a parameter.

Therefore, in a parametric model every element of $Phi $ is put into correspondence with at least one parameter $	heta $.

When [eq47] associates to each parameter a unique joint distribution function (i.e. when [eq48] is a function) the parametric model is called a parametric family:

Definition Let [eq49] be a parametric model. If $gamma $ is a function from $Theta $ to $Phi $, then the parametric model is called a parametric family. In this case, the joint distribution function associated to a parameter $	heta $ is denoted by [eq50].

When each distribution function is associated with only one parameter, the parametric family is said to be identifiable:

Definition Let [eq51] be a parametric family. If $gamma $ is one-to-one (i.e. each distribution function F is associated with only one parameter), then the parametric family is said to be identifiable.

Statistical inferences

A statistical inference is a statement about the unknown distribution function [eq19], based on the observed sample $xi $ and the statistical model $Phi $. Statistical inferences are often chosen among a set of possible inferences and take the form of model restrictions. Given a subset of the original model [eq53], a model restriction can be either an inclusion restriction:[eq54]or an exclusion restriction:[eq55]

The following are common kinds of statistical inferences:

  1. In hypothesis testing, a restriction [eq56] is proposed and the choice is between two possible statements:

    1. reject the restriction;

    2. do not reject the restriction.

  2. In estimation, a restriction [eq56] must be chosen among a set of possible restrictions.

  3. In Bayesian inference, the observed sample $xi $ is used to update the subjective probability that a restriction [eq56] is true.

Decision theory

The choice of the statement (the statistical inference) to make based on the observed data can often be formalized as a decision problem where:

  1. making a statistical inference is regarded as an action;

  2. each action can have different consequences, depending on which distribution function [eq19] is the true one;

  3. a preference ordering over possible consequences needs to be elicited;

  4. an optimal course of action needs to be taken, coherently with elicited preferences.

There are several different ways of formalizing such a decision problem. The branch of statistics that analyzes these decision problems is called statistical decision theory.

The book

Most learning materials found on this website are now available in a traditional textbook format.