This lecture discusses the main properties of the Normal Linear Regression Model (NLRM), a linear regression model in which the vector of errors of the regression is assumed to have a multivariate normal distribution conditional on the matrix of regressors. The assumption of multivariate normality, together with other assumptions (mainly concerning the covariance matrix of the errors), allows us to derive analytically the distributions of the Ordinary Least Squares (OLS) estimators of the regression coefficients and of several other statistics.
We use the same notation used in the lecture entitled Properties of the OLS estimator (to which you can refer for more details): the vector of observations of the dependent variable is denoted by , the matrix of regressors (called design matrix) is denoted by , the vector of errors is denoted by and the vector of regression coefficients is denoted by , so that the regression equations can be written in matrix form asThe OLS estimator is the vector which minimizes the sum of squared residualsand, if the design matrix has full rank, it can be computed as
The assumptions made in a normal linear regression model are:
the design matrix has full-rank (as a consequence, is invertible and the OLS estimator is );
conditional on , the vector of errors has a multivariate normal distribution with mean equal to and covariance matrix equal towhere is a positive constant and is the identity matrix;
Note that the assumption that the covariance matrix of is diagonal implies that the entries of are mutually independent, that is, is independent of for . Moreover, the assumption that all diagonal entries of the covariance matrix are equal implies that all the entries of have the same variance, that is, for any . The latter assumption is often referred to as "homoscedasticity assumption", and if the assumption is satisfied, we say that the errors are homoscedastic. On the contrary, if homoscedasticity does not hold, we say that the errors are heteroscedastic.
Under the assumptions made in the previous section, the OLS estimator has a multivariate normal distribution, conditional on the design matrix.
Proposition In a Normal Linear Regression Model, the OLS estimator has a multivariate normal distribution, conditional on , with mean and covariance matrix
First of all, note thatThe fact that we are conditioning on means that we can treat as a constant matrix. Therefore, conditional on , the OLS estimator is a linear transformation of a multivariate normal random vector (the vector ). This implies that also is multivariate normal, with meanand variance
Note that means that the OLS estimator is unbiased, not only conditionally, but also unconditionally, because by the Law of Iterated Expectations we have that
The variance of the error terms is usually not known. A commonly used estimator of is the adjusted sample variance of the residuals:where the regression residuals are
The properties enjoyed by are summarized by the following proposition.
Proposition In a Normal Linear Regression Model, the adjusted sample variance of the residuals is a conditionally unbiased estimator of :Furthermore, conditional on , has a Gamma distribution with parameters and and it is independent of .
Denote by the vector of residuals. Remember from the previous proof that the OLS estimator can be written asAs a consequence, we haveThe matrixis clearly symmetric (verify it by taking its transpose). It is also idempotent becauseTherefore,where has a standard multivariate normal distribution, that is, a multivariate normal distribution with zero mean and unit covariance matrix. Since the matrix is symmetric and idempotent, the quadratic form has a Chi-square distribution with a number of degrees of freedom equal to the trace of the matrix (see the lecture Normal distribution - Quadratic forms). But the trace of isSince the expected value of a Chi-square random variable is equal to its number of degrees of freedom, we haveMoreover, the fact that the quadratic form has a Chi-square distribution with degrees of freedom implies that the sample variancehas a Gamma distribution with parameters and (see the lecture on the Gamma distribution for a proof of this fact). To conclude, we need to prove that is independent of . Sinceandwe have that and are functions of the same multivariate normal random vector . Therefore, by standard results on the independence of quadratic forms involving normal vectors, and are independent if and are orthogonal. In order to check their orthogonality, we only need to verify that the product between and is zero:
Note that also in this case, the proposed estimator is unbiased not only conditionally, but also unconditionally because, by the Law of Iterated Expectations, we have that
We have already proved that in the Normal Linear Regression Model the conditional covariance matrix of the OLS estimator (conditional on ) is
In practice, however, this quantity is not known exactly because the variance of the error terms, that is , is unknown. However, we can replace its unknown value with the estimator proposed above (the adjusted sample variance of the residuals), so as to obtain an estimator of the covariance matrix of :
It can be proved that the OLS estimators of the coefficients of a Normal Linear Regression Model are equal to the maximum likelihood estimators. On the contrary, the maximum likelihood estimator of the variance of the error terms is different from the estimator derived above. For proofs of these two facts, see the lecture entitled Linear Regression - Maximum likelihood estimation.
Please cite as:
Taboga, Marco (2021). "The normal linear regression model", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/normal-linear-regression-model.
Most of the learning materials found on this website are now available in a traditional textbook format.