The Expectation-Maximization (EM) algorithm is a recursive algorithm that can be used to search for the maximum likelihood estimators of model parameters when the model includes some unobservable variables (also called latent variables).
In a latent-variable model, there are two vectors:
the observed data ;
the vector of unobserved variables .
We denote their joint probability bywhere is a vector of parameters to be estimated.
The joint distribution of and is usually obtained by separately specifying the conditional distribution of the observed variablesand the marginal distribution of the latent variables
As a consequence, the joint distribution is
In the context of latent-variable models, the joint distribution is often called complete likelihood.
We can also derive the conditional distribution of as
If is continuous the sums above are replaced by integrals.
The maximum likelihood estimator (MLE) of the parameter is the vector that solves the maximization problem
This problem does not usually have an analytical solution. As a consequence, we need to solve it with an iterative procedure that starts from an initial guess of the solution and produces a sequencethat hopefully converges to the solution.
We say hopefully because we are often unable to derive theoretical guarantees about the numerical convergence of likelihood maximization algorithms, especially for latent-variable models.
The EM algorithm is one of the iterative procedures that can be used to search for a solution when we are dealing with a latent-variable model specified as above.
Starting from an initial guess , the -th iteration of the EM algorithm consists of the following steps:
use the parameter value found in the previous iteration to compute the conditional probabilitiesfor each ;
use the conditional probabilities derived in step 1 to compute the expected value of the complete log-likelihood:
solve the maximization problem
if the parameter update is smaller than a pre-specified threshold , that is, if stop the algorithm, else return to step 1.
Steps 1 and 2 are collectively called the Expectation step, while step 3 is called the Maximization step. Hence the name of the algorithm (Expectation-Maximization).
Step 4 is a stopping criterion: we stop the algorithm when there are no significant changes in the parameter.
In general, the algorithm is not guaranteed to converge to a global maximum of the likelihood.
However, it has the important property that the likelihood is guaranteed to increase at each iteration:for every .
The conditional probability formulaimplies thatIf we multiply both sides by , we obtainThen, we can sum over the support of :Since the log-likelihood does not depend on and we haveMoreover, Thus,Setting , we obtainSubtracting (2) from (1), we getBy Jensen's inequality, we haveTherefore,which, for , becomes But becauseThereforeor
The EM algorithm is particularly advantageous when the maximization problem in the Maximization step has a closed-form solution.
This happens, for example, when the latent-variable model is a mixture of multivariate normal distributions.
As we said, there is no guarantee that the EM algorithm converges to a global maximum of the likelihood.
If we suspect that the likelihood may have multiple local minima, we should use the multiple starts approach. In other words, we should run the EM algorithm several times with different starting values .
The EM algorithm is often used to estimate the parameters of a mixture of normal distributions (also called Gaussian mixture). See this lecture for details.
Please cite as:
Taboga, Marco (2021). "EM algorithm", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/EM-algorithm.
Most of the learning materials found on this website are now available in a traditional textbook format.