Problem 1 EM for MAP estimation [25 marks]
Let \( X \) be the observed data, \( Z \) the corresponding hidden values, and \( \theta \) the parameters. We will use the EM algorithm to find the MAP solution of \( \theta \), i.e., the maximum of the posterior distribution over parameters \( p(\theta \mid X) \). In the E-step, we obtain the MAP \( Q \) function by taking the expectation of the posterior \( \log p(\theta \mid X, Z) \),
\[
Q_{M A P}\left(\theta ; \hat{\theta}^{\text {old }}\right)=\mathbb{E}_{Z \mid X, \hat{\theta}^{\text {old }}}[\log p(\theta \mid X, Z)] .
\]
In the M-step, \( Q_{M A P}\left(\theta ; \hat{\theta}^{\text {old }}\right) \) is maximized with respect to \( \theta \).
(a) [5 marks]: Show that the E- and M-steps of the MAP-EM algorithm can be written as
\[
\begin{aligned}
\mathrm{E}-\text { step : } \quad Q\left(\theta ; \hat{\theta}^{\text {old }}\right) & =\mathbb{E}_{Z \mid X, \hat{\theta}^{\text {old }}}[\log p(X, Z \mid \theta)], \\
\mathrm{M}-\text { step : } \quad \hat{\theta}^{\text {new }} & =\underset{\theta}{\operatorname{argmax}} Q\left(\theta ; \hat{\theta}^{\text {old }}\right)+\log p(\theta) .
\end{aligned}
\]
How is this related to the ordinary EM algorithm?
Now consider a univariate GMM with 2 components,
\[
p(x)=\pi_{1} \mathcal{N}\left(x \mid \mu_{1}, \sigma_{1}^{2}\right)+\left(1-\pi_{1}\right) \mathcal{N}\left(x \mid \mu_{2}, \sigma_{2}^{2}\right),
\]
where \( \theta=\left\{\pi_{1}, \mu_{1}, \mu_{2}\right\} \) are the parameters and the variances \( \sigma_{j}^{2} \) are known. The prior distribution is \( p(\theta)=p\left(\pi_{1}\right) p\left(\mu_{1}\right) p\left(\mu_{2}\right) \) where
\[
\begin{array}{l}
p\left(\pi_{1}\right)=1, \quad 0 \leq \pi_{1} \leq 1, \\
p\left(\mu_{1}\right)=\mathcal{N}\left(\mu_{1} \mid \mu_{0}, \sigma_{0}^{2}\right), \\
p\left(\mu_{2}\right)=\mathcal{N}\left(\mu_{2} \mid \mu_{0}, \sigma_{0}^{2}\right) .
\end{array}
\]
(b) [5 marks] Write down the complete data \( \log \)-likelihood, \( \log p(X, Z \mid \theta) \). (For convenience, you can define \( \pi_{2}=1-\pi_{1} \).)
(c) [5 marks \( ] \) Derive the E-step, i.e., the \( Q \) function, \( Q\left(\theta ; \hat{\theta}^{\text {old }}\right) \).
(d) [5 marks] Derive the M-step, i.e., the parameter updates of \( \theta \).
(e) [5 marks] What is the intuitive explanation of the E- and M-steps in (c) and (d)?