# Lecture12 - SVM

60 %
40 %

Published on March 2, 2009

Author: aorriols

Source: slideshare.net

Introduction to Machine Learning Lecture 12 Support Vector Machines Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

Recap of Lecture 11 1st generation NN: Perceptrons and others g p Also multi-layer percetrons Slide 2 Artificial Intelligence Machine Learning

Recap of Lecture 11 2nd generation NN g Some people figure it out how to adapt the weights of internal layers aye s Seemed to be very powerful and able to solve almost anything The reality showed that this was not exactly true Slide 3 Artificial Intelligence Machine Learning

Today’s Agenda Moving to SVM g Linear SVM The separable case The non-separable case Non-Linear Non Linear SVM Slide 4 Artificial Intelligence Machine Learning

Introduction SVM (Vapnik, 1995) (p, ) Clever type of perceptron Instead f h d di the layer of non-adaptive f t I t d of hand-coding th l f d ti features, each h training example is used to create a new feature using a fixed recipe ec pe A clever optimization technique is used to select the best subset o features of eatu es Many NNs researchers switched to SVM in the 1990s because they work better Here, we’ll take a slow path into SVM concepts Slide 5 Artificial Intelligence Machine Learning

Shattering Points with Oriented Hyperplanes Remember the idea I want to build hyperplanes that separate points of two classes In a two-dimensional space lines E.g.: Linear Classifier Which is the best separating line? Remember, a hyperplane is represented by th equation t d b the ti WX + b = 0 Slide 6 Artificial Intelligence Machine Learning

Linear SVM I want the line that maximizes the margin between g examples of both classes! Support Vectors Slide 7 Artificial Intelligence Machine Learning

Linear SVM In more detail Let’s assume two classes yi = {-1 1} {-1, Each example described by a set of features x (x is a vector; for clarity, we will mark vectors in bold in the remainder of the slides) The problem can be formulated as follows All training must satisfy ( (in the separable case) ) This can be combined Slide 8 Artificial Intelligence Machine Learning

Linear SVM What are the support vectors? pp Let’s find the points that lay on the hyper plane H1 Their perpendicular distance to the origin is Let’s find the points that lay on the hyper plane H2 Their perpendicular distance to the origin is The margin is: Slide 9 Artificial Intelligence Machine Learning

Linear SVM Therefore, the problem is , p Find the hyper plane that minimizes Subject to But let us change to the Lagrange formulation because The constraints will be placed on the Lagrange multipliers themselves (easier to handle) Training data will appear only in form of dot products between vectors Slide 10 Artificial Intelligence Machine Learning

Linear SVM The Lagrangian formulation comes to be g g Where αi are the Lagrange multipliers So, So now we need to Minimize Lp w.r.t w, b Simultaneously require that the derivatives of Lp w.r.t to α vanish All subject to the constraints αi ≥ 0 Slide 11 Artificial Intelligence Machine Learning

Linear SVM Transformation to the dual problem p This is a convex problem We W can equivalently solve th d l problem i l tl l the dual bl That is, maximize LD W.r.t αi Subject to constraints And with αi ≥ 0 Slide 12 Artificial Intelligence Machine Learning

Linear SVM This is a quadratic programming problem. You can solve it with many methods such as gradient descent We’ll not see these methods in class Slide 13 Artificial Intelligence Machine Learning

The Non-Separable case What if I can not separate the two classes p We will not be able to solve the Lagrangian formulation proposed Any idea? Slide 14 Artificial Intelligence Machine Learning

The Non-Separable Case Just relax the constraints by p y permitting some errors g Slide 15 Artificial Intelligence Machine Learning

The Non-Separable Case That means that the Lagrangian is rewritten g g We change the objective function to be minimized to uco o ed o Therefore, we are maximizing the margin and minimizing the error C i a constant to be chosen b th user is t tt b h by the The dual problem becomes Subject to and Slide 16 Artificial Intelligence Machine Learning

Non-Linear SVM What happens if the decision function is a linear function of pp the data? In our equations data appears in form of dot products xi · xj equations, Wouldn’t you like to have polynomials, logarithmics, … functions to fit the data? Slide 17 Artificial Intelligence Machine Learning

Non-Linear SVM The kernel trick Map the data into a higher-dimensional space Mercer theorem: any continuous, symmetric, positive semi- definite kernel function K(x, y) can be expressed as a dot product in a high dimensional space high-dimensional Now, we have a kernel function An example All we have talked about still holds when using the kernel function The only difference is that now my function will be Slide 18 Artificial Intelligence Machine Learning

Non-Linear SVM Some typical kernels A visual example of a polynomial kernel with p=3 i l lf l i lk l ith 3 Slide 19 Artificial Intelligence Machine Learning

Some Further Issues We have to classify data y Described by nominal attributes and continuous attributes Probably ith i i P b bl with missing values l That may have more than two classes How SVM deal with them? SVM defined over continuous attributes No problem! attributes. Nominal attributes Map into continuous space Multiple classes Build S SVM that discriminate each pair of f classes Slide 20 Artificial Intelligence Machine Learning

Some Further Issues I’ve seen lots of formulas… But I want to program a SVM pg builder. How I get my SVM? We have already mentioned that there are many methods to solve the quadratic programming problem Many algorithms designed for SVM One of the most significant: Sequential Minimal Optimization Currently, there are many new algorithms C lh l ih Slide 21 Artificial Intelligence Machine Learning

Next Class Association Rules Slide 22 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 12 Support Vector Machines Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

 User name: Comment:

## Related pages

### Lecture-12: Support Vector Machine & In-depth Convex ...

Lecture-12: Support Vector Machine & In-depth Convex Analysis Sanjeev Sharma. ... SVM can be used both as a hard & soft margin classifier.

### Support Vector Machines - Binghamton University

Support Vector Machines Lecture 1 Jan Kodovský, Jessica Fridrich January 26, 2010. ... Non-linear SVM Kernel Trick Lecture 12 – 13 Input space Feature ...

### Lecture 11 - SVM - 2014 - YouTube

Lecture 11 - SVM - 2014 ... Lecture-12: Support Vector ... SVM Tutorial Part2 - Duration: 41:24. by VisualPatterns 2,004 views. 41:24

### Support Vector Machines: Kernels - Cornell University

Support Vector Machines: Kernels CS4780/5780 – Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Schoelkopf/Smola Chapter 7.4, 7.6 ...

### 15.097 Lecture 12: Support vector machines

SVM’s maximize the distance from the decision boundary to the nearest training ... 15.097 Lecture 12: Support vector machines Author: Rudin, Cynthia