What is the Expectation-Maximization Algorithm?

In statistics, the expectation-maximization (EM) algorithm is a method used to perform (local) maximum likelihood or maximum a posteriori (MAP) estimations. It is used in cases where the equations cannot be solved directly. As a result, they usually involve the presence of unobserved latent variables.

The EM algorithm solves a limitation of the maximum likelihood method present in the assumption that the dataset is complete or fully observed, implying that all variables relevant to the model are being considered. However, in practice, there can be unobserved factors (latent variables).

This method works by first performing an expectation (E) step that consists of picking random values for the missing data points. These values are then used in the maximization (M) step, to create a better model. The process continues until it converges on a fixed point.

The EM algorithm was presented and named in a paper by Arthur Dempster, Nan Laird, and Donald Rubin in 1977.


Why is the Expectation-Maximization Algorithm Important?

The EM algorithm is widely used. In general, it is applied in modeling problems where there is a need to bridge the presence of unknown parameters.

There are many examples of its application. In machine learning, it is mostly used in unsupervised learning techniques, such as density estimation and clustering. In psychology, it is applied in item response theory models. In finance, it is used in portfolio management. In medicine, it is employed in tools for image reconstruction. And, in engineering, it finds applicability in identifying the properties of a system.


Expectation-Maximization Algorithm + LogicPlum

Modeling can become difficult, particularly in the presence of big data and unknown influencing factors. Applying correction methods, such as EM, can be challenging for those without the necessary mathematical knowledge.

LogicPlum’s platform provides a way for practitioners to create cutting-edge solutions without having to delve into statistics and machine learning. This tool uses automation to try and test hundreds of potential solutions and then selects the optimal one as measured by a selected metric. Modelers can thus focus on using their expertise in result interpretation and forecasting.


Guide to Further Reading

For those interested in the original paper:

Dempster, A., Laird, N., & Rubin, D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1-38. Available at https://www.jstor.org/stable/2984875?seq=1

© 2020 LogicPlum, Inc. All rights reserved.