Lecture13 - Association Rules

33 %
67 %
Information about Lecture13 - Association Rules
Education

Published on March 5, 2009

Author: aorriols

Source: slideshare.net

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

Recap of Lecture 5-12 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial Intelligence Machine Learning

Recap of Lecture 5-12 Data Set Classification Model How? We have seen four different types of approaches to classification : • Decision trees (C4.5) • Instance-based algorithms (kNN & CBR) Instance based • Bayesian classifiers (Naïve Bayes) •N Neural N t l Networks (P k (Perceptron, Ad li t Adaline, M d li Madaline, SVM) Slide 3 Artificial Intelligence Machine Learning

Today’s Agenda Introduction to Association Rules A Taxonomy of Association Rules Measures of Interest Apriori Slide 4 Artificial Intelligence Machine Learning

Introduction to AR Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 5 Artificial Intelligence Machine Learning

Introduction to AR Formalizing the problem a little bit g p Transaction Database T: a set of transactions T = {t1, t2, …, tn} Each transaction contains a set of items I (it E ht ti ti t f it (item set) t) An itemset is a collection of items I = {i1, i2, …, im} General aim: Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements in databases or other information repositories. Put this relationships in terms of association rules X⇒ Y Slide 6 Artificial Intelligence Machine Learning

Example of AR TID Items Examples: T1 bread, jelly, peanut-butter bread ⇒ peanut-butter peanut butter T2 bread, peanut-butter beer ⇒ bread T3 bread, milk, peanut-butter T4 beer, bread T5 beer, milk Frequent itemsets: Items that frequently appear together I = {bread, peanut-butter} {bread I = {beer, bread} Slide 7 Artificial Intelligence Machine Learning

What’s an Interesting Rule? Support count (σ) pp () TID Items T1 bread, jelly, peanut-butter Frequency of occurrence of a d e se and itemset T2 bread, peanut-butter ,p σ ({bread, peanut-butter}) = 3 T3 bread, milk, peanut-butter T4 beer, bread σ ({beer, bread}) = 1 ({ , }) T5 beer, milk Support Fraction f t F ti of transactions that ti th t contain an itemset s ({bread peanut butter}) = 3/5 ({bread,peanut-butter}) s ({beer, bread}) = 1/5 Frequent itemset F t it t An itemset whose support is greater than or equal to a minimum support threshold (minsup) Slide 8 Artificial Intelligence Machine Learning

What’s an Interesting Rule? An association rule is an TID Items implication of two itemsets T1 bread, jelly, peanut-butter X⇒Y T2 bread, peanut-butter ,p T3 bread, milk, peanut-butter T4 beer, bread Many measures of interest. T5 beer, milk The two most used are: Support (s) σ (X ∪Y ) The occurring frequency of the rule, s= i.e., number of transactions that # of trans. contain both X and Y Confidence (c) σ (X ∪Y ) The strength of the association, c= σ (X) i.e., i e measures of how often items in Y appear in transactions that contain X Slide 9 Artificial Intelligence Machine Learning

Interestingness of Rules TID Items TID s c T1 bread, jelly, peanut-butter bread ⇒ peanut-butter 0.60 0.75 T2 bread, peanut-butter peanut-butter ⇒ bread 0.60 1.00 T3 bread, milk, peanut-butter beer ⇒ bread 0.20 0.50 T4 beer, bread peanut-butter ⇒ jelly 0.20 0.33 T5 beer, milk jelly ⇒ peanut-butter 0.20 1.00 j ll ⇒ milk jelly ilk 0.00 0 00 0.00 0 00 Many other interesting measures The method presented herein are based on these two approaches Slide 10 Artificial Intelligence Machine Learning

Types of AR Binary association rules: y bread ⇒ peanut-butter Quantitative association rules: weight in [70kg – 90kg] ⇒ height in [170cm – 190cm] Fuzzy association rules: weight in TALL ⇒ height in TALL Let’s start for the beginning Binary association rules – A priori Bi i ti l ii Slide 11 Artificial Intelligence Machine Learning

Apriori This is the most influential AR miner It consists of two steps Generate all f G ll frequent i itemsets whose support ≥ minsup h i 1. Use frequent itemsets to generate association rules 2. So, let’s So let s pay attention to the first step Slide 12 Artificial Intelligence Machine Learning

Apriori null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 13 Artificial Intelligence Machine Learning

Apriori Let’s avoid expanding all the graph p g gp Key idea: Downward closure property: A subsets of a f D dl Any b f frequent itemset i are also frequent itemsets Therefore, the algorithm iteratively does: Create itemsets Only continue exploration of those whose support ≥ minsup Slide 14 Artificial Intelligence Machine Learning

Example Itemset Generation null Infrequent itemset A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCD Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 15 Artificial Intelligence Machine Learning

Recovering the Example TID Items T1 bread, jelly, peanut-butter T2 bread, peanut-butter T3 bread, ilk b d milk, peanut-butter b T4 beer, bread Minimum support = 3 pp T5 beer, milk b ilk 1-itemsets Item count 2-itemsets bread 4 Item count peanut-b 3 bread, peanut-b 3 jelly 1 milk 1 beer 1 Slide 16 Artificial Intelligence Machine Learning

Apriori Algorithm k=1 Generate frequent itemsets of length 1 Repeat until no frequent itemsets are found k := k+1 Generate itemsets of size k from the k-1 frequent itemsets Compute the support of each candidate by scanning DB Slide 17 Artificial Intelligence Machine Learning

Apriori Algorithm Algorithm Apriori(T) C1 ← init-pass(T); F1 ← {f | f ∈ C1, f.count/n ≥ minsup}; // n: no. of transactions in T for (k = 2; Fk-1 ≠ ∅; k++) do Ck ← candidate-gen(Fk-1); for each transaction t ∈ T do for each candidate c ∈ Ck do if c i contained i t th is t i d in then c.count++; endd end Fk ← {c ∈ Ck | c count/n ≥ minsup} c.count/n end return F ← Uk Fk; Slide 18 Artificial Intelligence Machine Learning

Apriori Algorithm Function candidate-gen(Fk-1) Ck ← ∅; forall f1, f2 ∈ Fk-1 with f1 = {i1, … , ik-2, ik-1} and f2 = {i1, … , ik-2, i’k-1} and ik-1 < i’k-1 do c ← {i1, …, ik-1, i’k-1}; // join f1 and f2 Ck ← Ck ∪ {c}; for each (k-1)-subset s of c do if ( ∉ Fk-1) th (s then delete c from Ck; // prune end end return Ck; Slide 19 Artificial Intelligence Machine Learning

Example of Apriori Run Itemset sup Itemset sup Database TDB Dtb {A} 2 L1 {A} 2 C1 Tid Items {B} 3 {B} 3 10 A, C A C, D {C} 3 {C} 3 1st scan 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E Itemset sup C2 C2 Itemset te set {A, {A B} 1 2nd scan L2 Itemset sup {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, {B C} 2 {A, E} {B, C} 2 {B, E} 3 {B, C} {B, E} 3 {C, E} 2 {C, E} 2 {B, {B E} {C, E} Itemset te set L3 C3 3rd scan Itemset It t sup {B, C, E} {B, C, E} 2 Slide 20 Artificial Intelligence Machine Learning

Apriori Remember that Apriori consists of two steps p p Generate all frequent itemsets whose support ≥ minsup 1. Use frequent it Uf t itemsets t generate association rules t to t i ti l 2. We accomplished step 1. So we have all frequent itemsets So, let’s pay attention to the second step Slide 21 Artificial Intelligence Machine Learning

Rule Generation in Apriori Given a frequent itemset L q Find all non-empty subsets F in L, such that the association rule F ⇒ {L-F} sat s es the minimum confidence ue { } satisfies t e u co de ce Create the rule F ⇒ {L-F} If L={A,B,C} The candidate itemsets are: AB⇒C, AC⇒B, BC⇒A, A⇒BC, B⇒AC, C⇒AB In general, there are 2K-2 candidate solutions, where k is the length of the itemset L Slide 22 Artificial Intelligence Machine Learning

Can you Be More Efficient? Can we apply the same trick used with support? pp y pp Confidence does not have anti-monote property Th t is, c(AB⇒D) > c(A ⇒D)? That i (AB D) (A D)? Don’t know! But confidence of rules generated from the same itemset does have the anti-monote property d h h i L={A,B,C,D} C(ABC⇒D) ≥ c(AB ⇒CD) ≥ c(A ⇒BCD) We can apply this p p y to p pp y property prune the rule g generation Slide 23 Artificial Intelligence Machine Learning

Example of Efficient Rule Generation ABCD Low confidence ABC⇒D ABD⇒C ACD⇒B BCD⇒A AB⇒CD AC⇒BD BC⇒AD AD⇒BC BD⇒AD CD⇒AB A⇒BCD B⇒ACD C⇒ABD D⇒ABC Slide 24 Artificial Intelligence Machine Learning

Challenges in AR Mining Challenges g Apriori scans the data base multiple times Most ft M t often, there is a high number of candidates th i hi h b f did t Support counting for candidates can be time expensive Several methods try to improve this points by Reduce the number of scans of the data base Shrink the number of candidates Counting the support of candidates more efficiently Slide 25 Artificial Intelligence Machine Learning

Next Class Advanced topics in association rule mining Slide 26 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

Add a comment

Related presentations

Related pages

Association Rules | LinkedIn

View 4297 Association Rules posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn.
Read more

Google

Advertising Programmes Business Solutions +Google About Google Google.com © 2016 - Privacy - Terms. Search; Images; Maps; Play; YouTube; News; Gmail ...
Read more

Data Warehousing and Data Mining

Data Warehousing and Data Mining. Overview of the course. CITS-3401. ... Association Rule. Classification. Clustering. Data Cube. Data Warehousing: (OLTP ...
Read more

lecture13_4up The document contains Datamining algorithm ...

详细说明:The document contains Datamining algorithm Association rule mining using boosted tree.
Read more