Statistical inference is the act of using observed data to infer unknown properties and characteristics of the probability distribution from which the observed data have been generated. The set of data that is used to make inferences is called sample.
In the simplest possible case, we observe the realizations , ..., of independent random variables , ..., having a common distribution function and we use the observed realizations to infer some characteristics of . With a slight abuse of language, we sometimes say " independent realizations of a random variable " instead of saying "the realizations of independent random variables , ..., having a common distribution function ".
Example The lifetime of a certain type of electronic device is a random variable , whose distribution function is unknown. Suppose we independently observe the lifetimes of components. Denote these realizations by , , ..., . We are interested in the expected value of , which is an unknown characteristic of . We infer from the data, estimating with the sample mean:In this simple example the observed data , , ..., constitute our sample and is the quantity about which we are making a statistical inference.
While in the simplest case , ..., are independent random variables, more complicated cases are possible. For example:
, ..., are not independent;
, ..., are random vectors having a common joint distribution function ;
, ..., do not have a common probability distribution.
Is there a definition of sample that generalizes all of the above special cases? Fortunately, there is one and it is extremely simple:
Definition A sample is the realization of a random vector .
The distribution function of , denoted by , is the unknown distribution function that constitutes the object of inference.
Therefore, 'sample' is just a synonym of 'realization of a random vector'. The following examples show how this general definition accommodates the special cases mentioned above:
Example When we observe realizations , ..., of independent random variables , ..., having a common distribution function , the sample is the -dimensional vector , which is a realization of the random vector . The joint distribution function of is:
Example When we observe realizations , ..., of random variables , ..., that are not independent but have a common distribution function , the sample is again the -dimensional vector , which is a realization of the random vector . However, in this case the joint distribution function can no longer be written as the product of the distribution functions of , ..., .
Example When we observe realizations , ..., of independent -dimensional random vectors , ..., having a common joint distribution function , the sample is the -dimensional vector , which is a realization of the random vector . The joint distribution function of is:
Example When we observe realizations , ..., of independent -dimensional random vectors , ..., having different joint distribution functions , ..., , the sample is the -dimensional vector , which is a realization of the random vector . The joint distribution function of is:
When the sample is made of realizations , ..., of random variables (or random vectors):then we say that the sample has size (or that the sample size is ). An individual realization is referred to as an observation from the sample.
In the previous section we have defined a sample as a realization of a random vector having joint distribution function . The sample is used to infer some characteristics of that are not fully known by the statistician. The properties and the characteristics of that are already known (or are assumed to be known) before observing the sample are called a model for . In mathematical terms, a model for is a set of joint distribution functions to which is assumed to belong:
Definition Let the sample be a realization of a -dimensional random vector having joint distribution function . Let be the set of all -dimensional joint distribution functions:A subset is called a statistical model (or a model specification or, simply, a model) for . If the model is said to be correctly specified (or well-specified). Otherwise, if the model is said to be mis-specified.
Continuing the examples of the previous section:
Example Suppose our sample is made of realizations , ..., of random variables , ..., . Assume that the random variables are mutually independent and that they have a common distribution function The sample is the -dimensional vector . is the set of all possible distribution functions of the random vector . Recalling the definition of marginal distribution function and the characterization of mutual independence, the statistical model is defined as follows:
Example Take the example above and drop the assumption that the random variables , ..., are mutually independent. The statistical model is now:
The next subsections introduce some terminology related to model specification.
A model for is called a parametric model if the joint distribution functions belonging to are put into correspondence with a set of real vectors:
Definition Let be a model for . Let be a set of -dimensional real vectors. Let be a correspondence that associates a subset of to each . The triple is a parametric model if and only if:The set is called parameter space. A vector is called a parameter.
Therefore, in a parametric model every element of is put into correspondence with at least one parameter .
When associates to each parameter a unique joint distribution function (i.e. when is a function) the parametric model is called a parametric family:
Definition Let be a parametric model. If is a function from to , then the parametric model is called a parametric family. In this case, the joint distribution function associated to a parameter is denoted by .
When each distribution function is associated with only one parameter, the parametric family is said to be identifiable:
Definition Let be a parametric family. If is one-to-one (i.e. each distribution function is associated with only one parameter), then the parametric family is said to be identifiable.
A statistical inference is a statement about the unknown distribution function , based on the observed sample and the statistical model . Statistical inferences are often chosen among a set of possible inferences and take the form of model restrictions. Given a subset of the original model , a model restriction can be either an inclusion restriction:or an exclusion restriction:
The following are common kinds of statistical inferences:
In hypothesis testing, a restriction is proposed and the choice is between two possible statements:
reject the restriction;
do not reject the restriction.
In estimation, a restriction must be chosen among a set of possible restrictions.
In Bayesian inference, the observed sample is used to update the subjective probability that a restriction is true.
The choice of the statement (the statistical inference) to make based on the observed data can often be formalized as a decision problem where:
making a statistical inference is regarded as an action;
each action can have different consequences, depending on which distribution function is the true one;
a preference ordering over possible consequences needs to be elicited;
an optimal course of action needs to be taken, coherently with elicited preferences.
There are several different ways of formalizing such a decision problem. The branch of statistics that analyzes these decision problems is called statistical decision theory.
Most learning materials found on this website are now available in a traditional textbook format.