In order to better understand the material presented here, you should be familiar with the main concepts of hypothesis testing in a ML framework (see the introductory lecture entitled Maximum likelihood - Hypothesis testing).
The score test allows to deal with null hypotheses of the following kind:where is an unknown parameter belonging to a parameter space , and is a vector valued function ().
As we have explained in the introductory lecture mentioned above, most of the common parameter restrictions that one might want to test can be written in the form .
The statistic employed in the score test is based on the ML estimate that is obtained from the solution of the constrained optimization problemwhere is the sample of observed data, is the likelihood function, and is the set of parameters that satisfy the restriction that is being tested.
The test statistic, called score statistic, iswhere is the sample size, is a consistent estimate of the asymptotic covariance matrix of the estimator (see the lecture entitled Maximum likelihood - Covariance matrix estimation), and is the gradient of the log-likelihood function (called score), that is, the vector of partial derivatives of the log-likelihood function with respect to the entries of the parameter vector .
In order to derive the asymptotic properties of the statistic , the following assumptions will be maintained:
the sample and the likelihood function satisfy some set of conditions that are sufficient to guarantee consistency and asymptotic normality of (see the lecture on maximum likelihood estimation for a set of such conditions);
for each , the entries of are continuously differentiable with respect to all entries of ;
the matrix of the partial derivatives of the entries of with respect to the entries of , called the Jacobian of and denoted by , has rank .
Given the above assumptions, and under the null hypothesis that , the statistic converges in distribution to a Chi-square distribution.
Proposition Provided some technical conditions are satisfied (see above), and provided the null hypothesis is true, the score statistic converges in distribution to a Chi-square distribution with degrees of freedom.
Denote by the unconstrained maximum likelihood estimate:By the Mean Value Theorem, we have thatwhere is an intermediate point (a vector whose components are strictly comprised between the components of and those of ). Since , we have thatTherefore,Again by the Mean Value Theorem, we have thatwhere is the Hessian matrix (a matrix of second partial derivatives) and is an intermediate point (actually, to be precise, there is a different intermediate point for each row of the Hessian). Because the gradient is zero at an unconstrained maximum, we have thatand, as a consequence,and It descends that Now, where is a vector of Lagrange multipliers. Thus, we have thatSolving for , we obtain
Now, the score statistic can be written asPlugging in the previously derived expression for , the statistic becomeswhereGiven that under the null hypothesis both and converge in probability to , also and converge in probability to , because the entries of and are strictly comprised between the entries of and . Moreover,where is the asymptotic covariance matrix of . We had previously assumed that also converges in probability to . Therefore, by the continuous mapping theorem, we have the following resultsBy putting together everything we have derived so far, we can write the score statistic as a sequence of quadratic forms whereand But in the lecture on the Wald test, we have proved that such a sequence converges in distribution to a Chi-square random variable with a number of degrees of freedom equal to .
In the score test, the null hypothesis is rejected if the score statistic exceeds a pre-determined critical value , that is, if
The size of the test can be approximated by its asymptotic value
where is the distribution function of a Chi-square random variable with degrees of freedom.
We can choose so as to achieve a pre-determined size, as follows:
A simple example of how the score test can be used follows.
Let the parameter space be the set of all
Denote the first and second component of the true parameter
Suppose we want to test the
this case, the function
is a function
and the Jacobian of
rank is equal to
Note also that it does not depend on
We then maximize the log-likelihood function with respect to
Suppose we obtain the following estimates of the parameter and of the
is the sample size. Suppose also that the value of the score
the score statistic is
statistic has a Chi-square distribution with
degrees of freedom. Suppose we want the size of our test to be
Then, the critical value
is the cumulative distribution function of a Chi-square random variable with
degree of freedom and
can be calculated with any statistical software (we have done it in MATLAB,
using the command
chi2inv(0.99,1)). Thus, the test
statistic exceeds the critical
we reject the null hypothesis.
Most of the learning materials found on this website are now available in a traditional textbook format.