 StatLect

# Probit classification model (or probit regression)

This lecture deals with the probit model, a binary classification model in which the conditional probability of one of the two possible realizations of the output variable is equal to a linear combination of the inputs, transformed by the cumulative distribution function of the standard normal distribution. ## Model specification

Assume that a sample of data , for , is observed, where:

• is an output variable that can take only two values, either or (it is a Bernoulli random variable);

• is a vector of inputs.

The conditional probability that the output is equal to , given the inputs , is assumed to be where is the cumulative distribution function of the standard normal distribution and is a vector of coefficients.

Moreover, if is not equal to , then it is equal to (no other values are possible), and the probabilities of the two values need to sum up to , so that ## Interpretation

The interpretation of the probit model is very similar to that of the logit model. You are advised to read the comments about the interpretation of the latter in the lecture entitled Logistic classification model.

## The probit model as a latent variable model

As in the case of the logit, also the probit model can be written as a latent variable model.

Define a latent variable where is a random error term having a standard normal distribution. The output is linked to the latent variable by the following relationship: We have that so that the latent variable model specified by (1) and (2) assigns to the inputs the same conditional distributions assigned by the probit model.

## Estimation by maximum likelihood

The vector of coefficients can be estimated by maximum likelihood (ML).

We assume that the observations in the sample are independently and identically distributed (IID) and that he matrix of inputs defined by has full rank.

In a separate lecture (ML estimation of the probit model), we demonstrate that the ML estimator can be found (if it exists) with the following iterative procedure.

Starting from an initial guess of the solution (e.g., ), we generate a sequence of guesses  is an diagonal matrix and is an vector. They are calculated as follows:

• compute • denote by the probability density function of the standard normal distribution, and compute the entries of the vector • compute the diagonal matrix The iterative procedure stops when numerical convergence is achieved, that is, when the difference between two successive guesses and is so small that we can ignore it.

If is the last step of the iterative procedure, then the maximum likelihood estimator is and its asymptotic covariance matrix is where .

As a consequence, the distribution of can be approximated by a normal distribution with mean equal to the true parameter and covariance matrix .

## Hypothesis testing

When we estimate the coefficients of a probit classification model by maximum likelihood (see previous section), we can carry out hypothesis tests based on maximum likelihood procedures (e.g., Wald, Likelihood Ratio, Lagrange Multiplier) to test a null hypothesis about the coefficients.

Furthermore, we can set up a z test to test a restriction on a single coefficient: where is the -th entry of the vector of coefficients and .

The test statistic is where is the -th entry of and is the -th entry on the diagonal of the matrix .

Since is asymptotically normal and is a consistent estimator of the asymptotic covariance matrix of , converges in distribution to a standard normal distribution (the proof is identical to the proof we have provided for the asymptotic normality of the z statistic in the lecture on the logit model).

By approximating the distribution of with its asymptotic one (a standard normal), we can derive critical values (depending on the desired size) and carry out the test.