The maximum likelihood, maximum a posteriori and expectation-maximisation estimation methods
Imagine you have a statistical model, that is, a mathematical description of your data which depends on some parameters
. The likelihood function, usually indicated as
, is a function of these parameters and represents the probability of observing evidence (observed data)
given said parameters:
Because it is a function of the parameters given the outcome, you write
The difference between probability and likelihood is quite subtle in that in common language they are be casually swapped, but they represent different things. The probability measures the outcomes observed as a function of the parameters
of the underlying model. But in reality
are unknown and in fact, we go through the reverse process: estimating the parameters given the evidence we observe. For this, we use the likelihood, which is defined as above because we maximise it in such a way to respond to the equality above. This is exactly what the ML estimation does, as per below.
Bear in mind that the likelihood is a function of
The Maximum Likelihood Estimation (MLE) is a procedure to find the parameters of a statistical model via the maximisation of the likelihood so as to maximise the agreement between the model and the observed data.
The maximisation of the likelihood is usually performed via the maximisation of its logarithm as it is much more convenient; the logarithm is a monotonic function so the procedure is legit.
Refer to the page about distributions
The likelihood function for a Bernoulli distribution (
) is, for parameter
so that if we take the logarithm, we get
To maximise it, we compute and nullify the first derivative
which leads to
and finally to
We want to estimate
. We know
The likelihood is (note that the
Now, again it is easier to work with the logarithm:
and so the maximum likelihood estimate for a given sample is 142.2 and we can could do the same to estimate
, obtaining (can be proven through second derivative that it is a maximum)
This Maximum a Posteriori (MAP) estimation method uses the mode of the posterior to estimate the unknown population.
From Bayes' theorem, the posterior is expressed as
being the parameters of the statistical model and
the observed data. The MAP method estimates
as the one which maximises the posterior; note that the denominator is just a normalisation factor:
This means exactly taking the mode of the posterior distribution.
In the case of a uniform prior, the MAP estimation is equal to the ML estimation as we get to maximise the likelihood because the prior becomes just a factor. For the computation, conjugate priors are particularly handy.
As in the case of the MLE, what we really do is maximising the logarithm of the posterior rather than the posterior itself, so we do
In the last equation, if we only had the first term to maximise, we would be doing a ML estimation. The second term is the one accounting for the presence of a prior: this is why the MAP method is considered as a regularised ML as prior knowledge is factored in the computation.
While the ML method can be seen as responding to a frequentist approach, the MAP method responds to a Bayesian approach.
The EM algorithm can be used to find the solution of MLE or MAP when some data is missing, meaning there are some latent variables not observed.
Let's say that for the random variable
we have the
which depend on parameters
, and that the goal is to find the parameter
that maximises the likelihood which is of the form
meaning it is a sum over the latent variables
; this makes the problem difficult to solve analytically.
The EM algorithm updates the parameters in steps, which means it risks obtaining a local rather than a global maximum.
In the E phase (time
), the expected value of
is computed with respect to the conditional distribution of
under the current estimate of parameters
This means that the log-likelihood is evaluated using the current state of the parameters.
In the M phase (time
), we find the parameters which maximises the log-likelihood found in the E step: