Mixture Models and EM
How to model data with a probability distribution? For example, if data looks like a circle or symmetric, we may model it with a Gaussian distribution:
With the following density function:
Given i.i.d data points
The solution is:
But how about data that looks like this?
Here, Gaussian could be a better model, and we can not fit a multi-modal Gaussian. One solution is a mixture of Gaussian:
The following plot is an example of a Gaussian mixture distribution in one dimension showing three Gaussians (each scaled by a coefficient) in blue and their sum in red.
Mixture of Gaussian with components
Let’s formalize a mixture of the Gaussian model:
define
Graphical model:
- define marginal on
:
where
since only one
- define conditional:
- define joint probability:
or
- marginal is a mixture of Gaussian:
Therefore to sample from a mixture of Gaussian:
We can do the following:
- draw a sample
, so with probability , we are in th component. - given
draw :
If we think of
Maximum Likelihood Estimation for Gaussian Mixture
Given observations
for each
joint probability for
based on MLE estimator:
- suppose we fix
, , then we can optimize over :
where
To find a minimizer, set the gradient to
- Similarly, if we fix
we can optimize over :
- If we fix
we can then optimize over :
The following plot illustrates the EM algorithm (Bishop Fig. 9.8).
We covered this post in the introduction to machine learning CPCS 481/581, Yale University, Andre Wibisono where I (joint with Siddharth Mitra) was TF.