Em algorithm

50 %
50 %
Information about Em algorithm
Science

Published on September 26, 2014

Author: aleksetyulpin

Source: slideshare.net

Description

EM-algorithm is the one of the most important algorithms in Machine Learning. The presentation shows a mathematical base of it.

Parametric estimation of the multivariate probability density function. EM-algorithm. RBF network A.Tyulpin Measurement Systems and Digital Signal Processing laboratory, Northern (Arctic) Federal University September, 2014 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 1 / 30

Plan 1 Examples of data classification problems 2 Classification using Bayes decision theory 3 Parametric estimation of probability density function 4 Mixture model and EM algorithm 5 EM-algorithm for Gaussian mixture model 6 Data generation using GMM 7 Radial Basis Functions A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 2 / 30

Classification problems A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 3 / 30

Classification problems A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 4 / 30

Bayes decision rule Binary classification problem X  Rn – features space, = f1; : : : ;Kg – set of labels. X  – statistical population. !1; : : : ; !K – are denoted as classes. Xm = f(xj ; !j)g 2 X  ; j = 1;m – data sample. p(xj!j) – probability density function (pdf) of x in class !j . p(x; !) – joint pdf in X  . P(!ijx) – a posterior probability of x 2 !i, i = 1; 2. Bayes decision rule (two classes): Assign x to !1 if P(!1jx) > P(!2jx) Bayes decision rule (K classes): Assign x to !l, where !l = arg max !2 P(!jx) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 5 / 30

How to find P(!j jx) Bayes rule P(!j jx) = p(xj!j)P(!j) p(x) If we have a data sample Xm, it is possible to calculate P(!j). But pdfs p(xj!j) and p(x) are still unknown. Two approaches for estimation of pdf Nonparametric, e.g. Parzen window (or kernel density estimation). Parametric e.g. maximum likelihood parameter estimation and EM-algorithm. There is also another approach – modelling of P(!j jx) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 6 / 30

Maximum likelihood estimation Let x1; : : : ; xm be random samples drawn from pdf p(x; ). X = fx1; : : : ; xmg, p(X; )  p(x1; : : : ; xm; ) – joint pdf. Assuming statistical independence between different samples, we have a likelihood function: p(x1; : : : ; xm; ) = mY k=1 p(xk; ): (1) Maximum likelihood Maximum likelihood (ML) method estimates  so that the value of the likelihood function takes its maximum value: ML = arg max  mY k=1 p(xk; ): (2) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 7 / 30

Maximum likelihood estimation Obvious fact: arg max  mY k=1 p(xk; ) = arg max  ln mY k=1 p(xk; ): (3) Let log-likelihood function is denoted as L() = ln mY k=1 p(xk; ) = Xm k=1 ln p(xk; ): (4) It takes its maximum value where @L() @ = Xm k=1 1 p(xk; ) @p(xk; ) @ = 0: (5) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 8 / 30

Normal distribution 3 2 1 0 − 1 − 2 − 3 − 3 0 − 2 0 − 1 0 0 1 0 2 0 3 0 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 9 / 30

Complex distribution 3 2 1 0 − 1 − 2 − 3 − 4 − 5 − 6 − 1 5 − 1 0 − 5 0 5 1 0 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 10 / 30

Mixture model Let pdf of x 2 X is a mixture of K distributions: p(x) = XK j=1 jpj(x); XK j=1 j = 1; j  0; (6) where pj(x) and j  Pj – are pdf and a prior probability of j-th component of the mixture respectively. '(x; ) is a parametric family of pdfs: pj(x)  '(x; j). Separation of mixture Let K and Xm are given. We need to estimate a vector of perameters  = [1; : : : ; K; 1; : : : ; K]. The naive MLE of p(x) is a very complex problem. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 11 / 30

EM-algorithm EM(expectation-maximization) Algorithm for separation of mixture of distributions. The general idea of EM-algorithm: Repeat following steps while  and G are not stable: 1 G = E() (E-step) 2  = M(;G) (M-step) In the EM-algorithm we will use G = (gij)mK P– the matrix of latent variables. gij  P(j jxi) and 8i = 1; : : : ;m K j=1 gij = 1. G is very useful for calculating of MLE of p(x). A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 12 / 30

EM-algorithm We can use Bayes rule: gij = jpj(xi) PK s=1 sps(xi) ; (7) therefore, if we have  we can calculate G. This is the goal of E-step. Now, let’s look at the MLE of p(x). This an optimization problem Q() = Xm i=1 ln XK j=1 jpj(xi) ! max; (8) with constraints of equlity and inequality type: XK j=1 j = 1; j  0: A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 13 / 30

EM-algorithm Let’s "forget"about constraints j  0; j = 1; : : : ;K for a while and use Lagrange multipliers: L(;Xm) = Xm i=1 ln XK j=1 jpj(xi)  XK j=1  : (9) j 1 @L @j = pj(xi) PK s=1 sps(xi)  = 0: (10) Let’s multiply eq. (10) by j , sum all K such equations and change indices of summation: Xm i=1 XK j=1 jpj(xi) PK s=1 sps(xi) =  XK j=1 j : (11) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 14 / 30

EM-algorithm Let’s ¾forget¿ about constraints j  0; j = 1; : : : ;K for a while and use Lagrange multipliers: L(;Xm) = Xm i=1 ln XK j=1 jpj(xi)  XK j=1 j 1  : (12) @L @j = pj(xi) PK s=1 sps(xi)  = 0: (13) Let’s multiply this equation by j , sum all K such equations and change indices of summation: Xm i=1 XK j=1 jpj(xi) PK s=1 sps(xi) | {z } 1 XK =  j=1 j | {z } 1 )  = m: (14) A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 15 / 30

EM-algorithm Now, let’s multiply the eq. (10) by j , but with a substitution  = m: 8i = 1; : : : ;m Xm i=1 jpj(xi) PK s=1 sps(xi) = mj : (15) It’s obviously that j = 1 m Xm i=1 gij : (16) Also, we can see that constarints j  0 are satisfied if they were satisfied at the beginning. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 16 / 30

EM-algorithm Recall that pj(x)  '(x; j). Now, let’s find a partial derivative @L @j : @L @j = Xm i=1 j PK s=1 sps(xi) @pj(xi) @j = = Xm i=1 jpj(xi) PK s=1 sps(xi) | {z } gij @ @j ln pj(xi) = @ @j Xm i=1 gij ln pj(xi) = 0: (17) The problem is called weighted MLE: j = arg max  Xm i=1 gij ln pj(xi) (18) The goal of M-step is to find new values of j and solve K independent problems of weighted MLE for j . A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 17 / 30

Last words about EM-algorithm in general EM-algorithm converges. Q() can has many extremes, therefore it can stuck at the local extremes. Usage of Stochastic EM-algorithm can might solve such problems. For chosing the K we can use the EM-algorithm with sequential increasing of it. It is very benefitial to use well-known pdfs: Gaussian, Bernoulli, etc. On the next slides example of using Gaussian Mixture Model will be given. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 18 / 30

Gaussian Mixture Model If the '(x; j) = N(x; j ; j), then 8j = 1 : : : ;K: 1st case:  is non-diagonal matrix (not benefitial to use): j = 1 mj Xm i=1 gijxi (19) j = 1 mj Xm i=1 gij(xi j)(xi j)T (20) 2nd case:  is a diagonal matrix 8l = 1 : : : ; n: jl = 1 mj Xm i=1 gijxil (21) 2 jl = 1 mj Xm i=1 gij(xil jl)2 (22) There are other cases, but these 2 are important. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 19 / 30

EM-algorithm for GMM with diagonal  Data: Xm, K, [1; : : : ; K], [1; : : : ; K], [1; : : : ;K], " Result: [1; : : : ; K], [1; : : : ;K] G = (0)mn; repeat{ for i = 1; : : : ;m, j = 1; : : : ;K do g0 ij := gij ; gij := P jN(xi;j ;j ) K s=1 sN(xi;s;s) ; end for j = 1; : : : ;K do j := 1 m Pm i=1 gij ; for l = 1; : : : ;m do jl := 1 mj Pm i=1 gijxil; 2 jl := 1 mj Pm i=1 gij(xil jl)2; end end until max i;j ij j < "; jgij g0 return [1; : : : ; K], [1; : : : ;K] A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 20 / 30

Example of usage of GMM and EM 3 2 1 0 − 1 − 2 − 3 − 4 − 5 − 6 GMM 5 comp on e n t s − 1 5 − 1 0 − 5 0 5 1 0 A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 21 / 30

Examples of data generation Handwritten digits – 1797 of images 8x8. 100 of them: A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 22 / 30

3 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 23 / 30

5 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 24 / 30

10 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 25 / 30

20 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 26 / 30

30 components A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 27 / 30

Radial Basis Function network Recall the Bayes decision rule: Assign x to !l, where !l = arg max !2 P(!jx) Schema for M-classes problem: A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 28 / 30

References K. Vorontsov – Mathematical methods of supervised learning. S. Theodoridis – Pattern Recognition. Cristoph M. Bishop – Pattern Recognition and Machine Learning. A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 29 / 30

Thanks! alekseytyulpin@gmail.com A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 30 / 30

Add a comment

Related presentations

Related pages

EM-Algorithmus – Wikipedia

Die Kernidee des EM-Algorithmus ist es, mit einem zufällig gewählten Modell zu starten, und abwechselnd die Zuordnung der Daten zu den einzelnen Teilen ...
Read more

Expectation–maximization algorithm - Wikipedia, the free ...

In statistics, an expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates ...
Read more

What is the expectation maximization - Stanford Artificial ...

What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou ... EM starts with an initial guess of the parameters. 2. In the E-step,
Read more

The EM Algorithm - Carnegie Mellon School of Computer Science

The EM Algorithm Ajit Singh November 20, 2005 1 Introduction Expectation-Maximization (EM) is a technique used in point estimation. Given a set of observable
Read more

The Expectation Maximization Algorithm: A short tutorial

The Expectation Maximization Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 18 2004 Last updated ...
Read more

EMアルゴリズム - Wikipedia

EMアルゴリズム(英: expectation–maximization algorithm 、EM法)や期待値最大化法(きたいちさいだいかほう) とは ...
Read more

Expectation Maximization (EM) Algorithm - hu-berlin.de

Expectation Maximization (EM) Algorithm Philipp Gschöpf Wolfgang Karl Härdle Andrija Mihoci Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center ...
Read more

EM Algorithm - Department of Statistics and Actuarial ...

EM Algorithm Shu-Ching Chang Hyung Jin Kim December 9, 2007 1 Introduction It’s very important for us to understand the data structure before doing the
Read more

MM algorithm - Wikipedia, the free encyclopedia

The EM algorithm can be treated as a special case of the MM algorithm. However, in the EM algorithm ... The historical basis for the MM algorithm can be ...
Read more

PartIX TheEMalgorithm - CS 229: Machine Learning

CS229Lecturenotes Andrew Ng PartIX TheEMalgorithm In the previous set of notes, we talked about the EM algorithm as applied to fitting a mixture of Gaussians.
Read more