Lecture15 - Advances topics on association rules PART II

50 %
50 %
Information about Lecture15 - Advances topics on association rules PART II
Education

Published on March 9, 2009

Author: aorriols

Source: slideshare.net

Introduction to Machine Learning Lecture 15 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Recap of Lecture 13-14 Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 2 Artificial Intelligence Machine Learning

Recap of Lecture 13-14 Apriori p Will find all the association with minimum support and co de ce confidence However: Scans the data base multiple times Most often, there is a high number of candidates Support counting for candidates can be time expensive FP-growth Will obtain the same rules than Apriori Avoids candidate generation by building a GP tree Counting the support of candidates more efficiently Slide 3 Artificial Intelligence Machine Learning

Today’s Agenda Continuing our journey through some advanced topics in ARM Mining frequent patterns without candidate generation Multiple Level AR Sequential Pattern Mining Quantitative association rules Mining class association rules Beyond support & confidence B d t fid Applications Slide 4 Artificial Intelligence Machine Learning

Acknowledgments Part of this lecture is based on the work by y Slide 5 Artificial Intelligence Machine Learning

Why Multiple Level AR? Aim: Find associations between items But wait! There are many different diapers Dodot, Huggies … gg There are many different beers: heineken, desperados, king fisher … in bottle/can … , p , g Which rule do you prefer? diapers ⇒ beer dodot diapers M ⇒ Dam beer in Can Which will have greater support? Slide 6 Artificial Intelligence Machine Learning

Concept Hierarchy Create is-a hierarchies Clothes Footwear Outwear Shoes Shirts Hiking Boots Jackets Ski Pants Assume we found the rule: Outwear ⇒ Hiking boots Then Jackets ⇒ Hiking boots may not have minimum support Clothes ⇒ Hiking boots may not have minimum confidence Slide 7 Artificial Intelligence Machine Learning

Concept Hierarchy This means that Rules at lower levels may not have enough support to be part of any frequent itemset However, rules at a lower level of the hierarchy which are overspecific may denote a strong association Jackets ⇒ Hiking boots So, which rules do you want? Users are interested in generating rules that span different levels of the taxonomy Rules of lower levels may not have minimum support Taxonomy can be used to prune uninteresting or redundant rules Multiple taxonomies may be present For example: category, price (cheap, expensive), “items-on-sale”, etc Multiple taxonomies may be modeled as a forest, or a DAG Slide 8 Artificial Intelligence Machine Learning

Notation z ancestors (marked with ^) edge: parent is_a relationship p c1 c2 child descendants Slide 9 Artificial Intelligence Machine Learning

Notation Formalizing the problem g p I = {i1, i2, …, im}- items T-transaction, set of items T ⊆ I Tt ti t f it D-set of transactions T supports item x, if x is in T or x is an ancestor of some item in T T supports X ⊆ I if it supports e e y item in X suppo ts t suppo ts every te Generalized association rule: X ⇒ Y if X ⊂ I Y ⊂ I X ∩ Y = ∅ and no item in Y is an ancestor of any ∅, I, I, item in X. That is, jacket ⇒ clothes is essentially true The rule X ⇒ Y has confidence c in D if c% of transactions in D that support X also support Y The rule X ⇒ Y has support s in D if s% of transactions in D supports X ∪ Y Slide 10 Artificial Intelligence Machine Learning

So, Let’s Re-state the Problem New aim: find all generalized association rules that have g support and confidence greater than the user-specified minimum support (called minsup) and minimum confidence (called minconf) respectively Clothes Footwear Outwear Shoes Shirts Hiking Boots Jackets J kt Ski P t Pants Antecedent and consequent may have items of any level of the hierarchy Do you see any potential problem? I can find many redundant rules! Slide 11 Artificial Intelligence Machine Learning

Mining the Example Frequent Itemsets Database D Itemset Support Transaction Items Bought {Jacket} 2 100 Shirt {Outwear} {O t } 3 200 Jacket, Hiking Boots {Clothes} 4 300 Ski Pants, Hiking Boots {Shoes} 2 400 Shoes Sh {Hiking Boots} 2 500 Shoes {Footwear} 4 600 Jacket {Outwear, Hiking Boots} 2 Rules { {Clothes,Hiking Boots} , g } 2 Rule Support Confidence {Outwear, Footwear} 2 Outwear ⇒ Hiking Boots 33% 66.6% {Clothes, Footwear} 2 Outwear ⇒ Footwear 33% 66.6% Hiking Boots ⇒ Outwear 33% 100% minsup = 30% Hiking Boots ⇒ Clothes 33% 100% minconf = 60% Slide 12 Artificial Intelligence Machine Learning

Mining the Example Observation 1 If the set{x,y} has minimum support, so do {x^,y^} {x^,y} and { ,y } {x^,y^} E.g.: if {Jacket Shoes} has minsup then {Jacket, {Outwear, Shoes}, {Jacket, Footwear}, and {Outwear, Footwear} also have minimum support } pp Slide 13 Artificial Intelligence Machine Learning

Mining the Example Observation 2 If the rule x ⇒ y has minimum support and confidence, then x ⇒ y^ is guaranteed to have bot minsup a d minconf. y s gua a teed a e both sup and co E.g.: The rule Outwear ⇒ Hiking Boots has minsup and minconf minconf. The rule Outwear ⇒ Footwear has both minsup and minconf However, th rules x^ ⇒ y and x^ ⇒ y^ will h H the l ^ d^ ^ ill have minsup, th i they may not have minconf. E.g.: E Clothes ⇒ Hiking Boots Cl th ⇒ F t Clothes Footwear have minsup, but not minconf Slide 14 Artificial Intelligence Machine Learning

Interesting Rules So, in which rules are we interested? , Up to now, we were interested in rules that How much the support of a rule was more than the expected support based on the support of the antecedent and the consequent But this does not consider taxonomy I have poor pruning… But now, I need to prune a lot! Shrikant and Agrawal proposed a different approach Consider that Milk Milk ⇒ cereal [s=0.08, c=0.70] [s = ] And that Skim milk ⇒ cereal [s=0.02, c=0.70] 2% Milk Skim Milk [s = ] [s = ] So, do you think that the second rule is important? May be not! Slide 15 Artificial Intelligence Machine Learning

Interesting Rules A rule is X ⇒ Y is R-interesting w.r.t g an ancestor X^ ⇒ Y^ if: real s ( X ⇒ Y ) > R · expected s( X ⇒ Y ) b d on ( X ^ ⇒ Y ^ ) l td( based or real c ( X ⇒ Y ) > R · expected s( X ⇒ Y ) b d on ( X ^ ⇒ Y ^ ) l d( based Aim: Interesting rules will be those whose support is more than R times the expected value or whose confidence is more than R times the expected value for some user specified constant R value, user-specified Slide 16 Artificial Intelligence Machine Learning

Interesting Rules What’s the expected value? p A method defined to compute the expected value Pr( z j ) Pr( z1 ) EZˆ [Pr( Z )] = × ... × × Pr( Z ) ˆ ˆ ˆ Pr( z1 ) Pr( z j ) Where Z^ is an ancestor of Z Go to the papers for the details Now, Now we aim at: finding all generalized R-interesting association rules (R is a user-specified user specified minimum interest called min interest) that have min-interest) support and confidence greater than minsup and minconf respectivelyy Slide 17 Artificial Intelligence Machine Learning

Algorithms to Mine General AR Follow three steps: p Find all itemsets whose support is greater than minsup. 1. These itemsets are ca ed frequent itemsets. ese e se s a e called eque e se s Use the frequent itemsets to generate the desired rules: 2. if ABCD and AB are frequent then 1. 1 conf(AB ⇒ CD) = support(ABCD)/support(AB) 2. Prune all uninteresting rules f P ll i t ti l from thi set this t 3. Different algorithms for this purpose Basic Cumulate EstMerge Slide 18 Artificial Intelligence Machine Learning

Basic Algorithm Follow the steps: p Is itemset X is frequent? Does t D transaction T supports X? ti t (X contains items from different levels of taxonomy, T contains only leaves) T’ = T + ancestors(T); Answer: T supports X ↔ X ⊆ T’ T Slide 19 Artificial Intelligence Machine Learning

Details of the Basic Algorithm Count item occurrences Generate new k-itemsets k itemsets candidates Add all ancestors of each item in t to t, removing any duplication Find the support of all the candidates Take only those with support over minsup Slide 20 Artificial Intelligence Machine Learning

Can You Optimize It? Optimization 1: Filtering the ancestors added to p g transactions We only need to add to transaction t the ancestors that are in one of the candidates. If the original item is not in any itemsets it can be dropped from itemsets, the transaction. Clothes Outwear Shirts Jackets Ski Pants Example: Candidates: {clothes, shoes}. Transaction t: {Jacket, …} can be replaced with {clothes …} {Jacket } {clothes, } Slide 21 Artificial Intelligence Machine Learning

Can You Optimize It? Optimization 2: Pre-computing ancestors p p g Rather than finding ancestors for each item by traversing the taxonomy g ap , we ca p e co pu e the a ces o s for eac a o o y graph, e can pre-compute e ancestors o each item We ca d op a ces o s that a e not co a ed in a y o the e can drop ancestors a are o contained any of e candidates in the same time Clothes Outwear Shirts Jackets Ski Pants Slide 22 Artificial Intelligence Machine Learning

Can You Optimize It? Optimization 3: Prune itemsets containing an item and p g its ancestor If we have {Jacket} and {Outwear} we will have candidate {Outwear}, {Jacket, Outwear} which is not interesting. s({Jacket}) = s ({Jacket, Outwear}) ({Jacket Delete ({Jacket, Outwear}) in k=2 will ensure it will not erase in k>2. k>2 (because of the prune step of candidate generation method) Therefore, Therefore we can prune the rules containing an item an its ancestor only for k=2, and in the next steps all candidates will not include item + ancestor Slide 23 Artificial Intelligence Machine Learning

Summary Importance of hierarchy in real-world applications p y pp How? Build B ild a DAG Redefine the problem of ARM Get association rules Don t Don’t take these ideas in isolation! Applicable to all the advances we will see in the next classes Real-world problems usually require the mixing of many ideas Slide 24 Artificial Intelligence Machine Learning

Next Class Advanced topics in association rule mining Slide 25 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 15 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Add a comment

Related presentations

Related pages

National Student Nurses’ Association, Inc. Code of ...

National Student Nurses’ Association, Inc.® Code of Ethics: Part II ... rules, regulations, and policies is part ... National Student Nurses ...
Read more

Google

Advertising Programmes Business Solutions +Google About Google Google.com © 2016 - Privacy - Terms. Search; Images; Maps; ... Advanced search Language tools:
Read more

Part I: Choosing a Topic for Research - Montgomery County ...

PART II: So I know what I ... Think of parallel and broader associations for your subject if you need a broader topic that will ... Part I: Choosing a ...
Read more

High School Football Rules Changes - Home | Wisconsin ...

High School Football Rules Changes . FOR IMMEDIATE RELEASE Contact: Bob Colgate
Read more

United States Golf Association: Rules Hub

Golf Association Resources; MEMBERSHIP ... Enter a golf term in the search bar below to quickly and easily access a particular part of the Rules ...
Read more

Apriori algorithm - Wikipedia, the free encyclopedia

The Apriori algorithm was proposed by ... usually the most important part of the implementation is the ... GPL Java association rule mining ...
Read more

News - National Association of Boards of Pharmacy® (NABP®)

The National Association of Boards ... is now issuing controlled substance registrations to advanced practice ... The general rule for Schedule II CS
Read more

Ethical Guidelines - American Statistical Association

Ethical Guidelines for Statistical Practice ... This document contains two parts: I. Preamble and II. ... When some other rule of authorship order is ...
Read more