# Em algorithm

38 %
63 %
Science

Published on September 26, 2014

Author: aleksetyulpin

Source: slideshare.net

## Description

EM-algorithm is the one of the most important algorithms in Machine Learning. The presentation shows a mathematical base of it.

Parametric estimation of the multivariate probability density function. EM-algorithm. RBF network A.Tyulpin Measurement Systems and Digital Signal Processing laboratory, Northern (Arctic) Federal University September, 2014 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 1 / 30

Plan 1 Examples of data classification problems 2 Classification using Bayes decision theory 3 Parametric estimation of probability density function 4 Mixture model and EM algorithm 5 EM-algorithm for Gaussian mixture model 6 Data generation using GMM 7 Radial Basis Functions A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 2 / 30

Classification problems A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 3 / 30

Classification problems A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 4 / 30

Bayes decision rule Binary classification problem X  Rn – features space, = f1; : : : ;Kg – set of labels. X  – statistical population. !1; : : : ; !K – are denoted as classes. Xm = f(xj ; !j)g 2 X  ; j = 1;m – data sample. p(xj!j) – probability density function (pdf) of x in class !j . p(x; !) – joint pdf in X  . P(!ijx) – a posterior probability of x 2 !i, i = 1; 2. Bayes decision rule (two classes): Assign x to !1 if P(!1jx) > P(!2jx) Bayes decision rule (K classes): Assign x to !l, where !l = arg max !2 P(!jx) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 5 / 30

How to find P(!j jx) Bayes rule P(!j jx) = p(xj!j)P(!j) p(x) If we have a data sample Xm, it is possible to calculate P(!j). But pdfs p(xj!j) and p(x) are still unknown. Two approaches for estimation of pdf Nonparametric, e.g. Parzen window (or kernel density estimation). Parametric e.g. maximum likelihood parameter estimation and EM-algorithm. There is also another approach – modelling of P(!j jx) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 6 / 30

Maximum likelihood estimation Let x1; : : : ; xm be random samples drawn from pdf p(x; ). X = fx1; : : : ; xmg, p(X; )  p(x1; : : : ; xm; ) – joint pdf. Assuming statistical independence between different samples, we have a likelihood function: p(x1; : : : ; xm; ) = mY k=1 p(xk; ): (1) Maximum likelihood Maximum likelihood (ML) method estimates  so that the value of the likelihood function takes its maximum value: ML = arg max  mY k=1 p(xk; ): (2) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 7 / 30

Maximum likelihood estimation Obvious fact: arg max  mY k=1 p(xk; ) = arg max  ln mY k=1 p(xk; ): (3) Let log-likelihood function is denoted as L() = ln mY k=1 p(xk; ) = Xm k=1 ln p(xk; ): (4) It takes its maximum value where @L() @ = Xm k=1 1 p(xk; ) @p(xk; ) @ = 0: (5) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 8 / 30

Normal distribution 3 2 1 0 − 1 − 2 − 3 − 3 0 − 2 0 − 1 0 0 1 0 2 0 3 0 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 9 / 30

Complex distribution 3 2 1 0 − 1 − 2 − 3 − 4 − 5 − 6 − 1 5 − 1 0 − 5 0 5 1 0 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 10 / 30

Mixture model Let pdf of x 2 X is a mixture of K distributions: p(x) = XK j=1 jpj(x); XK j=1 j = 1; j  0; (6) where pj(x) and j  Pj – are pdf and a prior probability of j-th component of the mixture respectively. '(x; ) is a parametric family of pdfs: pj(x)  '(x; j). Separation of mixture Let K and Xm are given. We need to estimate a vector of perameters  = [1; : : : ; K; 1; : : : ; K]. The naive MLE of p(x) is a very complex problem. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 11 / 30

EM-algorithm EM(expectation-maximization) Algorithm for separation of mixture of distributions. The general idea of EM-algorithm: Repeat following steps while  and G are not stable: 1 G = E() (E-step) 2  = M(;G) (M-step) In the EM-algorithm we will use G = (gij)mK P– the matrix of latent variables. gij  P(j jxi) and 8i = 1; : : : ;m K j=1 gij = 1. G is very useful for calculating of MLE of p(x). A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 12 / 30

EM-algorithm We can use Bayes rule: gij = jpj(xi) PK s=1 sps(xi) ; (7) therefore, if we have  we can calculate G. This is the goal of E-step. Now, let’s look at the MLE of p(x). This an optimization problem Q() = Xm i=1 ln XK j=1 jpj(xi) ! max; (8) with constraints of equlity and inequality type: XK j=1 j = 1; j  0: A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 13 / 30

EM-algorithm Let’s "forget"about constraints j  0; j = 1; : : : ;K for a while and use Lagrange multipliers: L(;Xm) = Xm i=1 ln XK j=1 jpj(xi)  XK j=1  : (9) j 1 @L @j = pj(xi) PK s=1 sps(xi)  = 0: (10) Let’s multiply eq. (10) by j , sum all K such equations and change indices of summation: Xm i=1 XK j=1 jpj(xi) PK s=1 sps(xi) =  XK j=1 j : (11) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 14 / 30

EM-algorithm Let’s ¾forget¿ about constraints j  0; j = 1; : : : ;K for a while and use Lagrange multipliers: L(;Xm) = Xm i=1 ln XK j=1 jpj(xi)  XK j=1 j 1  : (12) @L @j = pj(xi) PK s=1 sps(xi)  = 0: (13) Let’s multiply this equation by j , sum all K such equations and change indices of summation: Xm i=1 XK j=1 jpj(xi) PK s=1 sps(xi) | {z } 1 XK =  j=1 j | {z } 1 )  = m: (14) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 15 / 30

EM-algorithm Now, let’s multiply the eq. (10) by j , but with a substitution  = m: 8i = 1; : : : ;m Xm i=1 jpj(xi) PK s=1 sps(xi) = mj : (15) It’s obviously that j = 1 m Xm i=1 gij : (16) Also, we can see that constarints j  0 are satisfied if they were satisfied at the beginning. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 16 / 30

EM-algorithm Recall that pj(x)  '(x; j). Now, let’s find a partial derivative @L @j : @L @j = Xm i=1 j PK s=1 sps(xi) @pj(xi) @j = = Xm i=1 jpj(xi) PK s=1 sps(xi) | {z } gij @ @j ln pj(xi) = @ @j Xm i=1 gij ln pj(xi) = 0: (17) The problem is called weighted MLE: j = arg max  Xm i=1 gij ln pj(xi) (18) The goal of M-step is to find new values of j and solve K independent problems of weighted MLE for j . A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 17 / 30

Last words about EM-algorithm in general EM-algorithm converges. Q() can has many extremes, therefore it can stuck at the local extremes. Usage of Stochastic EM-algorithm can might solve such problems. For chosing the K we can use the EM-algorithm with sequential increasing of it. It is very benefitial to use well-known pdfs: Gaussian, Bernoulli, etc. On the next slides example of using Gaussian Mixture Model will be given. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 18 / 30

Gaussian Mixture Model If the '(x; j) = N(x; j ; j), then 8j = 1 : : : ;K: 1st case:  is non-diagonal matrix (not benefitial to use): j = 1 mj Xm i=1 gijxi (19) j = 1 mj Xm i=1 gij(xi j)(xi j)T (20) 2nd case:  is a diagonal matrix 8l = 1 : : : ; n: jl = 1 mj Xm i=1 gijxil (21) 2 jl = 1 mj Xm i=1 gij(xil jl)2 (22) There are other cases, but these 2 are important. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 19 / 30

EM-algorithm for GMM with diagonal  Data: Xm, K, [1; : : : ; K], [1; : : : ; K], [1; : : : ;K], " Result: [1; : : : ; K], [1; : : : ;K] G = (0)mn; repeat{ for i = 1; : : : ;m, j = 1; : : : ;K do g0 ij := gij ; gij := P jN(xi;j ;j ) K s=1 sN(xi;s;s) ; end for j = 1; : : : ;K do j := 1 m Pm i=1 gij ; for l = 1; : : : ;m do jl := 1 mj Pm i=1 gijxil; 2 jl := 1 mj Pm i=1 gij(xil jl)2; end end until max i;j ij j < "; jgij g0 return [1; : : : ; K], [1; : : : ;K] A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 20 / 30

Example of usage of GMM and EM 3 2 1 0 − 1 − 2 − 3 − 4 − 5 − 6 GMM 5 comp on e n t s − 1 5 − 1 0 − 5 0 5 1 0 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 21 / 30

Examples of data generation Handwritten digits – 1797 of images 8x8. 100 of them: A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 22 / 30

3 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 23 / 30

5 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 24 / 30

10 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 25 / 30

20 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 26 / 30

30 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 27 / 30

Radial Basis Function network Recall the Bayes decision rule: Assign x to !l, where !l = arg max !2 P(!jx) Schema for M-classes problem: A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 28 / 30

References K. Vorontsov – Mathematical methods of supervised learning. S. Theodoridis – Pattern Recognition. Cristoph M. Bishop – Pattern Recognition and Machine Learning. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 29 / 30

Thanks! alekseytyulpin@gmail.com A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 30 / 30

 User name: Comment:

## Related presentations

#### Earth s Diverse Environment by Juliet Origenes

November 11, 2014

How organisms adapt and survive in different environment.

#### ANOVA de una vía, modelo efectos fijos.

November 5, 2014

Aplicación de ANOVA de una vía, modelo efectos fijos, en el problema de una empres...

#### Teori pemetaan

November 10, 2014

learning how to mapping

#### Libros: Dra. Elisa Bertha Velázquez Rodríguez

November 10, 2014

Libros: Dra. Elisa Bertha Velázquez Rodríguez

#### Materi pelatihan gis

November 10, 2014

learning GIS

#### The Fourth Paradigm - Deltares Data Science Day, N...

November 10, 2014

In this talk we describe how the Fourth Paradigm for Data-Intensive Research is pr...

## Related pages

### EM-Algorithmus – Wikipedia

Die Kernidee des EM-Algorithmus ist es, mit einem zufällig gewählten Modell zu starten, und abwechselnd die Zuordnung der Daten zu den einzelnen Teilen ...

### Expectation–maximization algorithm - Wikipedia, the free ...

In statistics, an expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates ...

### What is the expectation maximization - Stanford Artificial ...

What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou ... EM starts with an initial guess of the parameters. 2. In the E-step,

### The EM Algorithm - Carnegie Mellon School of Computer Science

The EM Algorithm Ajit Singh November 20, 2005 1 Introduction Expectation-Maximization (EM) is a technique used in point estimation. Given a set of observable

### The Expectation Maximization Algorithm: A short tutorial

The Expectation Maximization Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 18 2004 Last updated ...

### EMアルゴリズム - Wikipedia

EMアルゴリズム（英: expectation–maximization algorithm 、EM法）や期待値最大化法（きたいちさいだいかほう） とは ...

### Expectation Maximization (EM) Algorithm - hu-berlin.de

Expectation Maximization (EM) Algorithm Philipp Gschöpf Wolfgang Karl Härdle Andrija Mihoci Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center ...

### EM Algorithm - Department of Statistics and Actuarial ...

EM Algorithm Shu-Ching Chang Hyung Jin Kim December 9, 2007 1 Introduction It’s very important for us to understand the data structure before doing the