Lecture14 - Advanced topics in association rules

50 %
50 %
Information about Lecture14 - Advanced topics in association rules
Education

Published on March 5, 2009

Author: aorriols

Source: slideshare.net

Introduction to Machine Learning Lecture 14 Advanced Topics in Association Rules Mining Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

Recap of Lecture 13 Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 2 Artificial Intelligence Machine Learning

Recap of Lecture 13 Itemset sup Itemset sup Database TDB Dtb {A} 2 L1 {A} 2 C1 Tid Items {B} 3 {B} 3 10 A, C A C, D {C} 3 {C} 3 1st scan 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E Itemset sup C2 C2 Itemset te set {A, {A B} 1 L2 2nd scan Itemset sup {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, {B C} 2 {A, E} {B, C} 2 {B, E} 3 {B, C} {B, E} 3 {C, E} 2 {C, E} 2 {B, {B E} {C, E} Itemset te set L3 C3 3rd scan Itemset It t sup {B, C, E} {B, C, E} 2 Slide 3 Artificial Intelligence Machine Learning

Recap of Lecture 13 Challenges g Apriori scans the data base multiple times Most ft M t often, there is a high number of candidates th i hi h b f did t Support counting for candidates can be time expensive Several methods try to improve this points by Reduce the number of scans of the data base Shrink the number of candidates Counting the support of candidates more efficiently Slide 4 Artificial Intelligence Machine Learning

Today’s Agenda Starting a journey through some advanced topics in ARM Mining frequent patterns without candidate generation Multiple Level AR Sequential Pattern Mining Quantitative association rules Mining class association rules Beyond support & confidence B d t fid Applications Slide 5 Artificial Intelligence Machine Learning

Revisiting Candidate Generation Remember A priori? p Use the previous frequent itemsets (k-1) to generate the k- itemsets te sets Count itemsets support by scanning the data base Bottleneck in the process: Candidate generation Suppose 100 items First level of the tree 100 nodes ⎛100 ⎞ Second level of the tree ⎜ ⎜2⎟ ⎟ ⎝ ⎠ ⎛100 ⎞ ⎜ ⎜k⎟ In general, number of k-itemsets: ⎟ ⎝ ⎠ Slide 6 Artificial Intelligence Machine Learning

Can We Avoid Generation? Build an auxiliar structure to get statistics about the g itemsets in order to avoid candidate generation Use an FP-tree FP tree Avoid multiple scans of the data Divide-and-conquer methodology Avoid candidate generation Outline of the process: Generate an FP-Tree Mine the FP-tree Slide 7 Artificial Intelligence Machine Learning

Building the FP-Tree TID Items Sorted FIS 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} 2 {A,B,C,F,L,M,O} {F,C,A,B,M} 3 {B,F,H,J,O} {F,B} 4 {B,C,K,S,P} {C,B,P} 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} Scan the DB for the first time and identify frequent itemsets. They are: <(f:4),(c:4), (a:3),(b:3),(m:3),(p:3)> We sort the items according to their frequency in the last column Slide 8 Artificial Intelligence Machine Learning

Building the FP-Tree After reading TID1: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:1 2 {A,B,C,F,L,M,O} {F,C,A,B,M} 3 {B,F,H,J,O} {F,B} C:1 4 {B,C,K,S,P} {C,B,P} A:1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} M:1 P:1 Scan again the DB to build the tree g Slide 9 Artificial Intelligence Machine Learning

Building the FP-Tree After reading TID2: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:2 2 {A,B,C,F,L,M,O} {F,C,A,B,M} 3 {B,F,H,J,O} {F,B} C:2 4 {B,C,K,S,P} {C,B,P} A:2 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:1 B:1 P:1 Slide 10 Artificial Intelligence Machine Learning

Building the FP-Tree After reading TID3: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:3 2 {A,B,C,F,L,M,O} {F,C,A,B,M} B:1 3 {B,F,H,J,O} {F,B} C:2 4 {B,C,K,S,P} {C,B,P} A:2 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:1 M:1 P:1 Slide 11 Artificial Intelligence Machine Learning

Building the FP-Tree After reading TID4: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:3 C:1 2 {A,B,C,F,L,M,O} {F,C,A,B,M} B:1 3 {B,F,H,J,O} {F,B} C:2 B:1 4 {B,C,K,S,P} {C,B,P} A:2 P:1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:1 M:1 P:1 Slide 12 Artificial Intelligence Machine Learning

Building the FP-Tree After reading TID5: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:4 C:1 2 {A,B,C,F,L,M,O} {F,C,A,B,M} B:1 3 {B,F,H,J,O} {F,B} C:3 B:1 4 {B,C,K,S,P} {C,B,P} A:3 P:1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:2 M:1 P:2 Slide 13 Artificial Intelligence Machine Learning

Building the FP-Tree TID Items Sorted FIS 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} root 2 {A,B,C,F,L,M,O} {F,C,A,B,M} F:4 C:1 3 {B,F,H,J,O} {F,B} Item B:1 4 {B,C,K,S,P} {C,B,P} F C:3 C3 B:1 B1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} C A A:3 P:1 B B:1 M M:2 P M:1 P:2 Build and index to access quickly to the nodes and traverse the tree q y Slide 14 Artificial Intelligence Machine Learning

Mining the FP-Tree Properties to mine the FP-tree p Node-link prop.: All possible itemsets in which the frequent item a is included can be found by following a’s node-links s c uded ca ou d oo g a s ode s root F:4 C:1 Item P has support of 3 B:1 F Two paths in the FP- C:3 B:1 tree for node P C {F,C,A,M} 1. A A:3 P:1 {C,B,P} {C B P} 2. 2 B B:1 M M:2 P M:1 P:2 Slide 15 Artificial Intelligence Machine Learning

Mining the FP-Tree Prefix path p p To calculate the frequent p p prop.: q patterns for a node a in path P, only the prefix subpath of node of node a in P needs to be accumulated, and the frequency count of every node in the prefix path should carry the same count as node a root Node i i N d P is involved in: l di F:4 C:1 Item (F:4,C:3,A:3,M:2,P:2) B:1 F Take the prefix of the C:3 B:1 path until M C (F:4,C:3,A:3) A A:3 P:1 Adjust counts to 2 B B:1 (F:2,C:2,A:2) M M:2 So, F, C, and A co-ocur P M:1 with M P:2 Slide 16 Artificial Intelligence Machine Learning

Mining the FP-Tree Fragment g g growth: Let α be an itemset in DB, B be α’s , conditional pattern base, and β be an itemset in B. Then, the support α U β is equivalent to the support of β in B. root t F:2 For M, we had , (F:2,C:2,A:2) C:2 (F:1,C:1,A:1,B:1) Therefore, A:2 {(F,C,A,M):2},{(F,C,M}:2}, B:1 … Slide 17 Artificial Intelligence Machine Learning

Is FP-growth Faster than Apriori? As the support threshold goes down, the number of itemsets increases dramatically. FP-growth does not need to generate candidates and test them them. Slide 18 Artificial Intelligence Machine Learning

Is FP-growth Faster than Apriori? Both FP-growth and A priori scale linearly with the number of transactions. But FP-growth is more efficient Slide 19 Artificial Intelligence Machine Learning

Next Class Advanced topics in association rule mining Slide 20 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 14 Advanced Topics in Association Rules Mining Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull

Add a comment

Related presentations

Related pages

Lecture 5b Association Rules Mining: Advanced Algorithms

Association Rules Mining: Advanced Algorithms Zhou Shuigeng April 29, 2007. ... M. Zaki. CHARM: An Efficient Algorithm for Closed Association Rule
Read more

Fast Algorithms for Mining Association Rules

Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan S&ant* IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120 ...
Read more

Data Mining Application - University of Louisville ...

Data Mining represents a process developed to examine large amounts ... The use of association rules has increased sales by 15%. ... advanced topics. •
Read more

MANAGING LYME DISEASE - ILADS

ADVANCED TOPICS IN ... MANAGING LYME DISEASE, 16h edition, October, ... Lyme Disease Association, Inc. P.O. Box 1438, Jackson, ...
Read more

Code of Ethics - Advanced Medical Technology Association ...

The AdvaMed Code of Ethics on Interactions with Health ... AdvaMed Advanced Medical Technology Association. youtube; linkedin; ... The New Rules of the Road:
Read more

Introduction to data mining: Association analysis

Introduction to data mining: Association ... regarding relevant topic offers from ... represented in the form of association rules or sets of ...
Read more

ProtegeOWL API Advanced Topics - Protege Wiki

ProtegeOWL API Advanced Topics. From Protege Wiki. Jump to: navigation, search. Main article: ProtegeOWL_API_Programmers_Guide. Contents. 1 Advanced Topics.
Read more

Advanced Discussion - Forums - RONR Forum

... about particular aspects of parliamentary procedure and Robert’s Rules of Order ... Advanced Discussion Latest Topics; ... Advanced Discussion ...
Read more