Statistical inference is the act of using observed data to infer unknown properties and characteristics of the probability distribution from which the data have been extracted.
We then use the observed realizations to infer some characteristics of the distribution.
Example The lifetime of a certain type of electronic device is a random variable , whose probability distribution is unknown. Suppose that we independently observe the lifetimes of components. Denote these realizations by , , ..., . We are interested in the expected value of , which is an unknown characteristic of its distribution. We use the sample meanas our estimate (best guess) of the expected value. In this simple example, the sample , , ..., is used to make a statistical inference about a characteristic (the expected value) of the distribution that generated the sample (the probability distribution of ).
The previous example shows that three fundamental elements are required to make a statistical inference:
a sample (the observed data);
a probability distribution that generates the data;
a characteristic of the distribution about which inferences are drawn.
In the next sections we define these three fundamental elements in a mathematically meaningful way.
In the above example of statistical inference, , ..., are independent random variables. However, more complicated cases are possible. For instance,
, ..., are not independent;
, ..., are random vectors having a common probability distribution;
, ..., do not have a common probability distribution.
Is there a definition of sample that generalizes all of the above special cases?
The definition is extremely simple.
Definition A sample is the realization of a random vector .
As we will see in the following examples, is a vector that collects the observed data.
The vector is considered a realization of a random vector .
The object of the statistical inferences is the probability distribution of .
The following examples show how this general definition accommodates the special cases mentioned above.
Note that, from now on, in order to be more precise, we will use the term distribution function instead of generically speaking of probability distribution.
Example We observe therealizations , ..., of some independent random variables , ..., having a common distribution function . The sample is the -dimensional vector which is a realization of the random vector Since the observations are independent, the joint distribution function of the vector is equal to the product of the marginal distributions of its entries:
The distribution function of , denoted by , is the unknown distribution function that constitutes the object of inference.
Example If we take the previous example and drop the assumption of independence, the sample and the vector are still defined in the same way, but the joint distribution function can no longer be written as the product of the distribution functions of , ..., .
In the next example the single observations are no longer scalars, but they are vectors.
Example If , ..., are independent -dimensional random vectors having a common joint distribution function , then the sample and the vector are -dimensional. We can still write the joint distribution function of as
In the following example we relax the assumption that all the observations come from a unique distribution.
Example If the -dimensional random vectors , ..., have different joint distribution functions , ..., , then and are defined as before, but the joint distribution function of is
When the sample is made of the realizations , ..., of random variables (or vectors), then we say that the sample size is .
An individual realization is referred to as an observation from the sample.
We now shift our attention to the probability distribution that generates the sample, which is another one of the fundamental elements of a statistical inference problem.
In the previous section we have defined a sample as a realization of a random vector having joint distribution function .
The sample is used to infer some characteristics of that are not fully known by the statistician.
The properties and the characteristics of that are already known (or are assumed to be known) before observing the sample are called a model for .
In mathematical terms, a model for is a set of joint distribution functions to which is assumed to belong.
Definition Let the sample be a realization of an -dimensional random vector having joint distribution function . Let be the set of all -dimensional joint distribution functions:A subset is called a statistical model (or a model specification or, simply, a model) for .
In this definition,
the set is a large set containing all the possible data-generating distributions;
the set is a smaller subset of data-generating distributions on which we focus our attention.
The smaller set is called a statistical model.
The following examples are a continuation of the examples made in the previous section.
Example Suppose that our sample is made of the realizations , ..., of the random variables , ..., . Assume that the random variables are mutually independent and that they have a common distribution function The sample is the -dimensional vector is the set of all possible distribution functions of the random vector . Recalling the definition of marginal distribution function and the characterization of mutual independence, we can define the statistical model as follows:
Example Take the example above and drop the assumption that the random variables are mutually independent. The statistical model is now:
If , the model is said to be correctly specified (or well-specified).
Otherwise, if the model is said to be mis-specified.
A model for is called a parametric model if the joint distribution functions belonging to are put into correspondence with a set of real vectors.
Definition Let be a model for . Let be a set of -dimensional real vectors. Let be a correspondence that associates a subset of to each . The triple is a parametric model if and only ifThe set is called parameter space. A vector is called a parameter.
Therefore, in a parametric model every element of is put into correspondence with at least one parameter .
When associates to each parameter a unique joint distribution function (i.e., when is a function) the parametric model is called a parametric family.
Definition Let be a parametric model. If is a function from to , then the parametric model is called a parametric family. In this case, the joint distribution function associated to a parameter is denoted by .
Here is a classical example of a parametric family.
Example Suppose that is assumed to have a multivariate normal distribution. Then, the model is the set of all multivariate normal distributions, which are completely described by two parameters (the mean vector and the covariance matrix ). Each parameter is associated to a unique distribution function in the set . Therefore, we have a parametric family.
When each distribution function is associated with only one parameter, the parametric family is said to be identifiable.
The set of multivariate normal distributions in the previous example is also an identifiable parametric family because each distribution is associated to a unique parameter.
A statistical inference is a statement about the unknown distribution function , based on the observed sample and the statistical model .
The following are common kinds of statistical inferences.
Hypothesis testing: we make an hypothesis about some feature of the distribution and we use the data to decide whether to reject or do not reject the hypothesis;
Point estimation: we use the data to estimate the value of a parameter of the data-generating distribution ;
Often, we make statistical inferences about model restrictions.
Given a subset of the original model , a model restriction can be either an inclusion restriction:or an exclusion restriction:
The choice of the statement (the statistical inference) to make based on the observed data can often be formalized as a decision problem where:
making a statistical inference is regarded as an action;
each action can have different consequences, depending on which distribution function is the true one;
a preference ordering over possible consequences needs to be elicited;
an optimal course of action needs to be taken, coherently with elicited preferences.
There are several different ways of formalizing such a decision problem. The branch of statistics that analyzes these decision problems is called statistical decision theory.
In this lecture we have touched on several important topics. You can read more about these topics in the following pages:
Please cite as:
Taboga, Marco (2021). "Statistical inference", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/statistical-inference.
Most of the learning materials found on this website are now available in a traditional textbook format.