Information about Lecture7 - IBk

Recap of Lecture 6 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial Intelligence Machine Learning

Recap of Lecture 6 Data Set Classification Model How? We are going to deal with: • Data described by nominal and continuous attributes • Data that may have instances with missing values Slide 3 Artificial Intelligence Machine Learning

Recap of Lecture 6 We want to build decision trees How can I automatically generate these types of trees? Decide which attribute we should put in each node Decide a split point Rely on information theory We also saw many other improvements Slide 4 Artificial Intelligence Machine Learning

Today’s Agenda Classification without building a model K-Nearest Neighbor (kNN) Effect of K Distance functions Variants of K-NN Strengths and weaknesses Slide 5 Artificial Intelligence Machine Learning

Classification without Building a Model Forget about a global model! g g Simply store all the training examples Build local model f each new t t i t B ild a l l d l for h test instance Refered to as lazy learners Some approaches to IBL Nearest neighbors Locally weighted regression Case-based reasoning Slide 6 Artificial Intelligence Machine Learning

k-Nearest Neighbors Algorithm g Store all the training data Given a new t t instance Gi test i t Recover the k neighbors of the test instance Predict th P di t the majority class among the neighbors j it l th i hb Voronoi Cells: The feature space is decomposed into several cells. E.g. for k=1 Slide 7 Artificial Intelligence Machine Learning

k-Nearest Neighbors But, where is the learning process? , gp Select the k neighbors and return the majority class is learning? No, that’s just t i i N th t’ j t retrieving But still, some important issues Which k should I use? Which distance functions should I use? Should I maintain all instances of the training data set? Slide 8 Artificial Intelligence Machine Learning

Which k Should I Use? The effect of k 15-NN 1-NN Do you remember the discussion about overfitting in C4.5? Apply the same concepts here! Slide 9 Artificial Intelligence Machine Learning

Which k Should I Use? Some experimental results on the use of different k p 7-NN Number of neighbors Notice that the test error decreases as k increases but at k ≈ 5- increases, 7, it starts increasing again Rule of thumb: k=3 k=5 and k=7 seem to work ok in the k=3, k=5, majority of problems Slide 10 Artificial Intelligence Machine Learning

Distance Functions Distance functions must be able to Nominal attributes Continuous attributes C ti tt ib t Missing values The key They must return a low value for similar objects and a high value for different objects Seems obvious, right? But still, it is domain dependent obvious still There are many of them. Let’s see some of the most used Slide 11 Artificial Intelligence Machine Learning

Distance Functions Distance between two points in the same space p p d(x, y) Some properties expected to be satisfied in general d(x, y) ≥ 0 and d(x, x) = 0 d(x y) = d(y x) d(x, d(y, d(x, y) + d(y, z) ≥ d(x, z) Slide 12 Artificial Intelligence Machine Learning

Distances for Continuous Variables Given x=(x1,…,xn)’ and y=(y1,…,yn)’ n d E ( x, y ) = [∑ ( xi − yi ) 2 ]1/ 2 Euclidean i =1 n d E ( x, y ) = [∑ ( xi − yi ) ] q 1/ q Minkowsky i =1 n d ABS ( x, y ) = ∑ | xi − yi | Distance absolute value i =1 Slide 13 Artificial Intelligence Machine Learning

Distances for Continuous Variables What if attributes are measured over different scales? Attribute 1 ranging in [0,1] Attribute 2 ranging in [0 1000] [0, Can you detect any potential problem in the aforementioned distance functions? X in [0,1], y in [0,1000] X in [0,1000], y in [0,1000] Slide 14 Artificial Intelligence Machine Learning

Distances for Continuous Variables The larger the scale, the larger the influence of the g , g attribute in the distance function Solution: Normalize each attribute How: Normalization by means of the range d (ex1a , ex2 ) a d anorm (ex1 , ex2 ) = a a max a − min a Normalization by means of the standard deviation d (ex1a , ex2 ) a d anorm (ex1a , ex2 ) = a 4σ a Slide 15 Artificial Intelligence Machine Learning

Distances for Nominal Attributes Several metrics to deal with nominal attributes Overlap distance function Idea: Two nominal attributes are equal only if they have the same value Slide 16 Artificial Intelligence Machine Learning

Distances for Nominal Attributes Several metrics to deal with nominal attributes Value difference metric (VDM) C = number of classes P(a exia, c) = conditional probability P(a, that the output class is c given that the attribute a has de value exia. Idea: Two nominal values are similar if they have more similar correlations with the output classes See (Wilson & Martinez) for more distance functions Slide 17 Artificial Intelligence Machine Learning

Distances for Heterogeneous Attributes What if my data set is described by both nominal and continuous attributes? Apply the same distance function Use nominal distance functions for nominal attributes Use continuous distance function for continuous attributes Slide 18 Artificial Intelligence Machine Learning

Variants of kNN Different variants of kNN Distance-weighted kNN Attribute-weighted kNN Slide 19 Artificial Intelligence Machine Learning

Distance-Weighted kNN Inference of original kNN g The k nearest neighbors vote for the class Shouldn t Shouldn’t the closest examples have a higher influence in the decision process? Weight the contribution of each of the k neighbors wrt their distance E.g., k f ( xq ) = arg max ∑ wiδ (v, f ( xi )) ˆ k ∑ wi f ( xi ) v∈V i =1 f ( xq ) = ˆ i =1 1 k where wi = ∑ wi d ( xq , xi ) 2 i =1 More robust to noisy instances and outliers E.g.: Shepard’s method (Shepard,1968) Slide 20 Artificial Intelligence Machine Learning

Attribute-weighted kNN What if some attributes are irrelevant or misleading? g If irrelevant cost increases, but accuracy is not affected If misleading i l di cost increases and accuracy may d ti d decrease Weight attributes: n d w( x, y ) = ∑ wi ( xi − yi ) 2 i =1 How to determine the weights? Option 1: The expert p p p provide us with the weights g Option 2: Use a machine learning approach More will be said in the next lecture! Slide 21 Artificial Intelligence Machine Learning

Strengths and Weaknesses Strengths of kNN Building of a new local model for each test instance Learning has no cost Empirical results show that the method is highly accurate w.r.t other machine learning techniques Weaknesses Retrieving approach, but does not learn No global model. The knowledge is not legible Test cost increases linearly with the input instances No generalization Curse of dimensionality: What happens if we have many attributes? Noise and outliers may have a very negative effect Slide 22 Artificial Intelligence Machine Learning

Next Class From instance-based to case-based reasoning A little bit more on learning Distance functions Prototype selection Slide 23 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 7 Instance Based Learning Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

1. Introduction to MachineLearning Lecture 7Instance Based LearningAlbert Orriols i Puigaorriols@salle.url.edui l @ ll ldArtificial Intelligence ...

Read more

Lecture7. by ttaljobory. on Aug 04, 2015. Report Category: Documents. Download: 0 Comment: 0. 51. views. Comments. Description. Download Lecture7 ...

Read more

Lecture7 Pricing Apr 02, 2015 Documents-jindal. of 23

Read more

ETH Zürich - D-BAUG - IBK - Prof. Dr. Michael H. Faber - Lehre - Risk and Safety ... Lecture7_GeNIe.rar: 09.11.11 Structural Reliability - Basic Principles

Read more

Identi cation Methods for Structural Systems Prof. Dr. Eleni Chatzi Lecture 7 - 10 April, 2013 Institute of Structural Engineering Identi cation Methods ...

Read more

Part 1 : Modelling Practical application of the MFE Method of Finite Elements I. Institute of Structural Engineering Page 2 Method of Finite Elements I

Read more

Bargh article - Automaticity of Social Behavior: Direct Effects of Trait Construct and Stereotype from PSY 1901 at Harvard

Read more

Methods to Pre-Process Training Data for K-Nearest Neighbors Algorithm. Research Paper / Jan 2010 Download Now. Related Content. Putting Together ...

Read more

## Add a comment