# Lecture18

71 %
29 %
Education

Published on March 19, 2009

Author: aorriols

Source: slideshare.net

Introduction to Machine Learning Lecture 18 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Recap of Lecture 17 Clustering g Hierarchical clustering Slide 2 Artificial Intelligence Machine Learning

Today’s Agenda Partitional clustering: K-means Applications of clustering Using Weka Slide 3 Artificial Intelligence Machine Learning

Partitional Clustering Aim Assign a set of objects into K clusters with no hierarchical s uc u e structure How? First approach: enumerate all partitions and get the one that Fi h ll ii d h h minimizes a measure of quality However H To expensive when the number of elements increases 2·104 partitions E.g.: Organize 30 objects into 3 groups Thence, we need heuristic methods Slide 4 Artificial Intelligence Machine Learning

Defining the Problem The problem is p Map N objects into K clusters Each bj t belongs t a separate cluster E h object b l to tlt Key factors Criterion function Algorithm process We’ll see Squared error algorithms Slide 5 Artificial Intelligence Machine Learning

Squared Error Algorithms Definition of squared error q Assume a collection of objects x1, x2, … xN We want to organize them in K clusters c1, c2, … cK The squared error criterion is defined as where Slide 6 Artificial Intelligence Machine Learning

Formulation of the Problem Goal Find the clusterization that minimizes the squared error over all poss b e clusterizations possible c us e a o s Characteristics of k-means It was discovered by several researches across different disciplines Requires the user to specify the number of clusters, which is k In this way, we avoid the problem of determining the number of clusters Uses a heuristic procedure to finish with the best prototypes Slide 7 Artificial Intelligence Machine Learning

K-means The procedure p Initialize a k-partition randomly or based on some prior 1. knowledge. Calculate the c us e p o o ype matrix M o edge Ca cu a e e cluster prototype a Assign each object of the data set to the nearest cluster center 2. (ci) Recalculate the cluster prototype matrix based on the current 3. pa t t o partition Repeat steps 2 and 3 until there is no change for each cluster 4. Will this lead the best solution? I don’t know At least, it will lead to an locally optimal solution least Slide 8 Artificial Intelligence Machine Learning

Example of k-means Slide 9 Artificial Intelligence Machine Learning

Example of k-means Slide 10 Artificial Intelligence Machine Learning

Example of k-means Slide 11 Artificial Intelligence Machine Learning

Example of k-means Slide 12 Artificial Intelligence Machine Learning

Conservative k-means alg. Lloyd algorithm is fast but in each iteration it moves y g many data points, not necessarily causing better convergence. A more conservative method would be to move one p point at a time only if it improves the overall clustering y p g cost The s a e t e c uste g cost o a pa t t o o data po ts is e smaller the clustering of partition of points s the better that clustering is Different methods (e g , the squared e o d sto t o ) ca be e e t et ods (e.g., t e squa ed error distortion) can used to measure this clustering cost Slide 13 Artificial Intelligence Machine Learning

Greedy k-means alg. Select an arbitrary partition P into k clusters 1. while forever 2. bestChange ? 0 1. for every cluster C 2. 2 for every element i not in C 1. if moving i to cluster C reduces its clustering cost g g 1. if (cost(P) – cost(Pi ? C) > bestChange 1. bestChange ? cost(P) – cost(Pi ? C) i* ? I C* ? C if bestChange > 0 3. Change partition P by moving i* to C* 1. else 4. return P 1. Slide 14 Artificial Intelligence Machine Learning

Some Remarks Further comments about k-means No efficient and universal method for identifying the initial pa o s partitions Run the algorithm many times with random initial partitions The iterative approach cannot guarantee convergence to global optimum Incorporation of techniques such GAs or SA to empower the p q p search toward the global optimum It is sensitive to outliers and noise Some approaches such as ISODATA and PAM consider the effect of outliers The definition of “means” restricts the application to continuous variables New dissimilarity measures to deal with categorical variables Slide 15 Artificial Intelligence Machine Learning

APPLICATIONS Slide 16 Artificial Intelligence Machine Learning

Traveling Salesman Problem Up to millions of cities First organize cities in clusters Results of 10k cities 100k cities 1M cities Slide 17 Artificial Intelligence Machine Learning

Bioinformatics – Gene Expression Data Application to pp Genome sequencing projects DNA microarray t h l i i technologies DNA microarray technology Effective and efficient way to measure gene expression levels of thousands of genes simultaneously Investigation of the role of the genes Clustering: Reveal hidden structures of biological data Assumption: Functionally similar genes or proteins usually share similar patterns or primary sequence structures Slide 18 Artificial Intelligence Machine Learning

Bioinformatics – Gene Expression Data Slide 19 Artificial Intelligence Machine Learning

Bioinformatics – Gene Expression Data Slide 20 Artificial Intelligence Machine Learning

Next Class Genetic Fuzzy Systems Slide 21 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 18 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

 User name: Comment:

May 21, 2018

May 21, 2018

May 21, 2018

May 21, 2018

May 21, 2018

May 21, 2018

## Related pages

### Lecture18 on Vimeo

Did you know? Learn to create velvety smooth tracking shots with jibs and cranes in that velvet palace known as Vimeo Video School.

### Lecture 18. — RG Dr. Mellau

Information zum Seitenaufbau und Sprungmarken fuer Screenreader-Benutzer: Ganz oben links auf jeder Seite befindet sich das Logo der JLU, verlinkt mit der ...

### Lecture #18 OUTLINE - University of California, Berkeley

1 EECS40, Fall 2003 Prof. KingLecture 18, Slide 1 Lecture #18 OUTLINE – Generation and recombination – Charge-carrier transport in silicon ...

### Lecture18: Re°ectionandImpedance - Institute of Nuclear ...

Lecture18: Re°ectionandImpedance Letuslookagainatthequestionofwhathappenswhenasoundwaveintheairruns intothesurfaceofabodyofwater(orawall,orglass ...

CVFX Lecture 18: Stereo rig calibration and projective reconstruction - Duration: 1:01:08. Rich Radke 5,716 views. 1:01:08

### lecture18[1] - Scribd

lecture18[1] - Free download as PDF File (.pdf), Text file (.txt) or view presentation slides online. olo