This lecture presents some examples of Hypothesis testing, focusing on tests of hypothesis about the mean, i.e. on using a sample to perform tests of hypothesis about the mean of an unknown distribution.
In this example we make the same assumptions we made in the example of set estimation of the mean entitled Set estimation of the mean - Normal IID samples. The reader is strongly advised to read that example before reading this one.
In this example, the sample
is made of
independent draws from a normal distribution having
unknown mean
and known variance
.
Specifically, we observe
realizations
,
...,
of
independent random variables
,
...,
,
all having a normal distribution with unknown mean
and known variance
.
The sample is the
-dimensional
vector
,
which is a realization of the random vector
.
We test the null
hypothesis that the mean
is equal to a specific value
:
We assume that the
parameter space is
the whole real line, i.e.
.
Therefore, the
alternative
hypothesis
is:
To construct a test
statistic, we use the sample mean
:
The test statistic
is:
This
test statistic is often called z-statistic or normal
z-statistic and a test of hypothesis based on this statistic is
called z-test or normal z-test.
Let
.
We reject the null hypothesis
if
or if
.
In other words, the critical
region
is:
Thus,
the critical values of the
test are
and
.
The power function of the
test
is:
where
is a standard normal random variable and the
notation
is
used to indicate the fact that the probability of rejecting the null
hypothesis is computed under the hypothesis that the true mean is equal to
.
The power function can be written
as:
where we have
defined
As
demonstrated in the lecture entitled Point
estimation of the mean, the sample mean
has a normal distribution with mean
and variance
,
given the assumptions on the sample
we made above. Subtracting the mean of a normal random variable from the
random variable itself and dividing it by the square root of its variance, one
obtains a standard normal random variable. Therefore, the variable
has a standard normal distribution.
When evaluated at the point
,
the power function is equal to the probability of committing a
Type I error, i.e. the
probability of rejecting the null hypothesis when the null hypothesis is true.
This probability is called the size of
the test and it is equal to:
where
is a standard normal random variable (this is trivially obtained by
substituting
with
in the formula for the power function found above).
This example is similar to the previous one. The only difference is that we now relax the assumption that the variance of the distribution is known.
In this example, the sample
is made of
independent draws from a normal distribution having unknown mean
and unknown variance
.
Specifically, we observe
realizations
,
...,
of
independent random variables
,
...,
,
all having a normal distribution with unknown mean
and unknown variance
.
The sample is the
-dimensional
vector
,
which is a realization of the random vector
.
We test the null
hypothesis that the mean
is equal to a specific value
:
We assume that the
parameter space is
the whole real line, i.e.
.
Therefore, the
alternative
hypothesis
is:
We construct two test
statistics, using the sample mean
:
and
either the unadjusted sample
variance:
or
the adjusted sample
variance:
The two test statistics
are:
where
the superscripts
and
indicate whether the test statistic is based on the unadjusted or the adjusted
sample variance. These two test statistics are often called
t-statistics or Student's t-statistics and
tests of hypothesis based on these statistics are called
t-tests or Student's t-tests.
Let
.
We reject the null hypothesis
if
or if
(for
or
).
In other words, the critical
region
is:
Thus,
the critical values of the
test are
and
.
The power function of the
test based on the unadjusted sample variance
is:
where
the notation
is
used to indicate the fact that the probability of rejecting the null
hypothesis is computed under the hypothesis that the true mean is equal to
and
is a non-central standard Student's t
distribution with
degrees of freedom and non-centrality parameter equal
to:
The power function can be written
as:
where
we have
defined
Given
the assumptions on the sample
we made above, the sample mean
has a normal distribution with mean
and variance
(see Point estimation of the mean), so
that the random
variable
has
a standard normal distribution. Furthermore, the unadjusted sample variance
has a Gamma distribution with parameters
and
(see Point estimation of the variance),
so that the random
variable
has
a Gamma distribution with parameters
and
.
Adding a constant
to a standard normal distribution and dividing the sum thus obtained by the
square root of a Gamma random variable with parameters
and
,
one obtains a non-central standard Student's
t distribution with
degrees of freedom and non-centrality parameter
.
Therefore, the random variable
has a non-central standard Student's t distribution with
degrees of freedom and non-centrality
parameter
The power function of the test based on the adjusted sample variance
is:
where
the notation
is
used to indicate the fact that the probability of rejecting the null
hypothesis is computed under the hypothesis that the true mean is equal to
and
is a non-central standard Student's t distribution with
degrees of freedom and non-centrality parameter equal
to:
The power function can be written
as:
where
we have
defined
Given
the assumptions on the sample
we made above, the sample mean
has a normal distribution with mean
and variance
(see Point estimation of the mean), so
that the random
variable
has
a standard normal distribution. Furthermore, the adjusted sample variance
has a Gamma distribution with parameters
and
(see Point estimation of the variance),
so that the random
variable
has
a Gamma distribution with parameters
and
.
Adding a constant
to a standard normal distribution and dividing the sum thus obtained by the
square root of a Gamma random variable with parameters
and
,
one obtains a non-central standard Student's
t distribution with
degrees of freedom and non-centrality parameter
.
Therefore, the random variable
has a non-central standard Student's t distribution with
degrees of freedom and non-centrality
parameter
Note that, for a fixed
,
the test based on the unadjusted sample variance is more powerful than the
test based on the adjusted sample variance,
i.e.:
because
and,
as a
consequence
The size of the test based on the
unadjusted sample variance is equal to:
where
is a standard Student's t distribution with
degrees of freedom.
When evaluated at the point
,
the power function is equal to the size of the test, i.e. the probability of
committing a Type I error. The
power function evaluated at
is:
where
is a non-central standard Student's t
distribution with
degrees of freedom and non-centrality parameter equal
to:
Therefore,
when
,
the non-centrality parameter is equal to
and
is just a standard Student's t distribution.
The size of the test based on the
adjusted sample variance is equal to:
where
is a standard Student's t distribution with
degrees of freedom.
When evaluated at the point
,
the power function is equal to the size of the test, i.e. the probability of
committing a Type I error. The
power function evaluated at
is:
where
is a non-central standard Student's t
distribution with
degrees of freedom and non-centrality parameter equal
to:
Therefore,
when
,
the non-centrality parameter is equal to
and
is just a standard Student's t distribution.
Note that, for a fixed
,
the test based on the unadjusted sample variance has a greater size than the
test based on the adjusted sample variance, because, as demonstrated above,
the former also has a greater power than the latter for any value of the true
parameter
.