Learning GMM means estimating appropriate $\pi_k, \mu_k, \Sigma_k$ for given $X={x_1,x_2,..,x_N}$
Classification using GMM is to find out in which Gaussian distrubution a given data $x_N$ was generated.
For this, responsibility is defined as follows.
$$\gamma(z_{nk}) = p(z_{nk}=1)|x_n) \tag{4}$$
$z_{nk} \in {0,1}$ is a binary variable with a value of 1 if the k-th Gaussian distribution of GMM is selected given xn or a value of 0 if not.
That is, when $z_{nk}$ is 1, it means that $x_n$ is generated in the k-th Gaussian distribution.
Classification using GMM is to select the Gaussian distribution with the highest value by calculating the k number of $\gamma(z_{nk})$ given xn.
If the values of all parameters $\pi, \mu, \Sigma$ of GMM are determined through learning, $\gamma(z_{nk})$ can be calculated as follows using Bayes'theorem.
Estimating $\pi,\mu,\Sigma$ with maxmized log-likelihood means
Estimating $\pi,\mu,\Sigma$ when log-like hood is at its maximum has the same meaning as constructing GMM that best represents given data X.
For do this,
it partial differentiate $\mathcal{L}(X;\theta)$ for $\pi_k, \mu_k, \Sigma_k$.
1) $\mu_k$
.
.
.
.
.
.
.
.
.
.
.
.
2) $\Sigma_k$
.
.
.
.
.
.
.
.
.
.
.
.
3) $\pi_k$
$\pi_k$, the last parameter of GMM, must maximize log-likelihood while satisfying the condition of equ.3.
so $\pi_k$ is estimated by using Lagrange multiplier method, Lagrangian $J(X;θ,λ)$ is as follows:
.
.
.
$\lambda$ is found by partial differentiation of Lagrangean.
.
.
.
.
.
.
.
.
.
.
.
. $\pi_k$ can be estimated using the calculated $\lambda$.
.
.
.
.
.
.
.
.
.
.
.
.
For estimating parameters of GMM,
In E-step of EM algorithm, $\gamma(z_{nk})$ is calculated by all datas and Gaussian distribution.
And then, In M-stepof EM algorithm, $\pi,\mu,\Sigma$ for all Gaussian distribution are estimated by using equ.7,8,11
These E-step and M-step are repeated until converging or a certain number of times.