Lecture16 - Advances topics on association rules PART III

43 %
57 %
Information about Lecture16 - Advances topics on association rules PART III

Published on March 9, 2009

Author: aorriols

Source: slideshare.net

Introduction to Machine Learning Lecture 16 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Recap of Lecture 13-15 Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 2 Artificial Intelligence Machine Learning

Recap of Lecture 15 Aim: Find associations between items But wait! There are many different diapers Dodot, Huggies … gg There are many different beers: heineken, desperados, king fisher … in bottle/can … , p , g Clothes Which rule do you prefer? diapers ⇒ beer Outwear Shirts dodot diapers M ⇒ Dam beer in Can Jackets Ski Pants Which will have greater support? Slide 3 Artificial Intelligence Machine Learning

Today’s Agenda Continuing our journey through some advanced topics in ARM Mining frequent patterns without candidate generation Multiple Level AR Sequential Pattern Mining Quantitative association rules Mining class association rules Beyond support & confidence B d t fid Applications Slide 4 Artificial Intelligence Machine Learning

Introduction to Seq. AR So far, we have seen , Apriori Fp-growth F th Mining multiple level AR But none of them consider the order of transactions However, However is the sequence important? Whether the hen or the egg? Sometimes, really important Analyze the sequence of items bought buy a customer Web usage mining searches for navigational patterns of users Slide 5 Artificial Intelligence Machine Learning

An Example in Web Usage Mining Web sequence: < {Homepage} {Electronics} {Computers} {Laptops} {Sony Vaio} {Order Confirmation} {Return to Shopping} > Slide 6 Artificial Intelligence Machine Learning

Definition Defining the problem: g p Let I = {i1, i2, …, im} be a set of items Sequence: A ordered li t of itemsets S An d d list f it t Itemset/element: A non-empty set of items X ⊆ I. We denote a sequence s b < 1a2…ar> where ai i an it by <a >, h is itemset, which i also t hi h is l called an element of s An l A element ( an it t (or itemset) of a sequence is denoted by { 1, x2, t) f id t d b {x …, xk}, where xj ∈ I is an item We W assume without loss of generality th t it ith t l f lit that items in an element i l t of a sequence are in lexicographic order Slide 7 Artificial Intelligence Machine Learning

Definition Defining the problem: g p Size: The size of a sequence is the number of elements (or itemsets) in the seque ce e se s) e sequence Length: The length of a sequence is the number of items in the seque ce sequence A sequence of length k is called k-sequence A sequence s1 = 〈 1a2…ar〉 i a subsequence of another 〈a is b f th sequence s2 = 〈b1b2…bv〉, or s2 is a supersequence of s1, if there e st integers 1 ≤ j1 < j2 < … < jr 1 < jr ≤ v such t at a1 ⊆ t e e exist tege s suc that r−1 bj1, a2 ⊆ bj2, …, ar ⊆ bjr. We also say that s2 contains s1 Slide 8 Artificial Intelligence Machine Learning

Example Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}. {, , , , , , , , } Sequence 〈{3}{4, 5}{8}〉 is contained in (or is a subsequence of) 〈{6} {3 7}{9}{4 5 8}{3 8}〉 {3, 7}{9}{4, 5, 8}{3, because {3} ⊆ {3, 7}, {4, 5} ⊆ {4, 5, 8}, and {8} ⊆ {3, 8}. However, 〈{3}{8}〉 is not contained in 〈{3, 8}〉 or vice versa. The size of the sequence 〈{3}{4, 5}{8}〉 is 3, and the length of the sequence is 4 Slide 9 Artificial Intelligence Machine Learning

Objective Objective of sequential pattern mining (SPM) j q p g( ) Input: A set S of input data sequences (or sequence database) Goal: the G l th problem of mining sequential patterns i t fi d all th bl f ii ti l tt is to find ll the sequences that have a user-specified minimum support Each E h such sequence is called a frequent sequence, or a h i ll d f t sequential pattern The support for a sequence is the fraction of total data sequences in S that contains this sequence Slide 10 Artificial Intelligence Machine Learning

Example Customer Transaction Transaction Customer Customer Sequence ID time (items bought) ID 1 July 20, 2005 30 1 < (30) (90)> 1 July 25, 2005 90 2 <(10 20) (30) (40 60 70)> 2 July 9, 2005 y, 10, 20 , 3 <(30 50 70)> ( ) 2 July 14, 2005 30 4 <(30) (40 70) (90)> 2 July 20, 2005 40,60,70 5 <(90)> 3 July 25, 2005 30,50,70 4 July 25, 2005 30 4 July 29, 2005 y, 40, 70 , 4 August 2, 2005 90 5 July 12, 2005 90 Sequential patterns with support >25% 1-sequence < (30)> <(40)> <(70)> <(90)> 2-sequence <(30)(40)> <(30)(70)><(30)(90)><(40 70)> 3-sequence <(30) (40 70)> Example borrowed from Bing Liu Slide 11 Artificial Intelligence Machine Learning

GSP GSP follows closely Apriori but for sequential patterns yp q p If a sequence S is not frequent, then none of the super- seque ces of s eque sequences o S is frequent For instance, if <ab> is infrequent so do <acb> and <(ca)b> GSP follows the next steps: f ll th tt Initially, every item in DB is a candidate of length-1 For each level (i.e., sequences of length-k) do Scan database to collect support count for each candidate sequence Generate candidate length-(k+1) sequences from length-k frequent sequences using Apriori Repeat until no frequent sequence or no candidate can be found Strength: Candidate pruning by Apriori Slide 12 Artificial Intelligence Machine Learning

The Algorithm Does this remind you Apriori? Slide 13 Artificial Intelligence Machine Learning

Quantitative AR Transaction ID Age Married NumCars 1 23 No 1 2 25 Yes 1 3 29 No 0 4 34 Yes 2 5 38 Yes Y 2 <Age: 30..39> and <Married: Yes> => <NumCars: 2> Support = 40% Conf = 100% 40%, How can we deal with these data? Slide 14 Artificial Intelligence Machine Learning

Map to Boolean Values Record Age g Age g Married Married NumCars NumCars ID [20..29] [30..39] Yes No 0 1 100 1 0 0 1 0 1 200 1 0 1 0 0 1 300 1 0 0 1 1 0 400 0 1 1 0 0 0 500 0 1 1 0 0 0 Now, Now use any system for mining boolean AR Apriori FP-growth Slide 15 Artificial Intelligence Machine Learning

Problems with this Approach MinSup If number of intervals is large, the support of a single interval can be lower MinConf Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller Example In the used partition: <NumCars:0> ⇒ <Married:No> c=100% But now, assume that in the partition, NumCars:0 and NumCars:1 go to the same interval <NumCars:0,1> ⇒ <Married:No> c=66.67% Slide 16 Artificial Intelligence Machine Learning

Problems with this Approach How we can solve this problem? Increase the number of intervals (to reduce information lost) while combining adjacent ones (t i hil bi i dj t (to increase support) t) ExecTime blows up as items per record increases ManyRules: Number of rules also blows up. Many of them will not be interesting Slide 17 Artificial Intelligence Machine Learning

Second Approach Other solutions? Well, the problem was that intervals were not the best ones Let’s t t L t’ try to create the best intervals f our d t t th b t i t l for data How? Discretizing/Clustering techniques Apply a discretizing/clustering technique to find the best y g g partitions Employ those partitions We’ll see how clustering techniques work in the next class. So, keep this in mind and p p pitch the p pieces together next class! g Slide 18 Artificial Intelligence Machine Learning

Third Approach And what if we do not map the input to a boolean p p space? Create interval based association interval-based rules directly So, So decide the best interval and and, then, count the support Usually, Usually these approaches do not provide all the association rules, but the ones with larger support and confidence f Fuzzy logics can also be applied here. But again, we’ll see GFS in two three lectures Slide 19 Artificial Intelligence Machine Learning

Mining Class Association Rules So far, we have seen ARM without any specific target , yp g It finds all possible rules that exist in data, i.e., any item can appear as a consequent or a condition of a rule However, what if we are interested in some specific targets? E.g.: Eg: The user has a set of text documents from some known topics. He/she wants to find out what words are associated or correlated with each topic So, now, we want to find: X ⇒ y, where X ⊆ I, and y ∈ Y The algorithms are very similar to those of ARM We are not going to see them in class. But you have information on the estudy Slide 20 Artificial Intelligence Machine Learning

Beyond Support and Confidence Support and Confidence are the basic measures of pp interestingness But many more have been proposed during the last few years Slide 21 Artificial Intelligence Machine Learning

Some Applications Wal-Mart has used the technique for years to mine POS data and arrange their store to maximize sales from such analysis Medical databases to discover commonly occurring diseases amongst groups of people Lottery results databases, to discover those lucky combinations of L tt lt d t b t di th lk bi ti f numbers Slide 22 Artificial Intelligence Machine Learning

Some Applications Power System Restoration y PSR is a multi-objective, multi-period, nonlinear, mixed integer op optimization p ob e with various co s a s a d a o problem a ous constraints and unforeseeable factors Discovering o assoc a o s that help bu d heuristics for PSR sco e g of associations a e p build eu s cs o S Actions in a PSR start_black_start_unit(x) start black start unit(x) energize_line(x) pick_up_load(x) pick up load(x) synchronize(x,y) connect_tie_line(x) connect tie line(x) crank_unit(x) energize_busbar(x) energize busbar(x) Slide 23 Artificial Intelligence Machine Learning

Some Applications Correlations with color, spatial relationships, etc. From coarse to Fine Resolution mining Slide 24 Artificial Intelligence Machine Learning

Next Class Clustering Slide 25 Artificial Intelligence Machine Learning

Introduction to Machine Learning Lecture 16 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Add a comment

Related presentations

Related pages

18c Perspectives on Authorship, Part III - Documents

More Topics. Search; Home; Documents; 18c Perspectives on Authorship, Part III; System is processing data ... Share 18c Perspectives on Authorship, Part III.
Read more

Chemistry Olympiad Exams - acs.org

Part I: Part II: Part III: ... Covers broad chemistry topics; Problem Solving; 105 minutes; 8 written questions; ... American Association of Chemistry ...
Read more

Administration Releases Five Final Rules Implementing WIOA

Recently the administration announced the advanced release of the ... Department of Labor-Only Final Rule (Titles I and III of ... National Association of ...
Read more

News - National Association of Boards of Pharmacy® (NABP®)

The National Association of ... is now issuing controlled substance registrations to advanced practice ... Topics covered include the Board’s rules ...
Read more

CS345 Lecture Notes - Stanford University

Sections of this document may also be downloaded by topic, below. Topic Slides ... Association-Rules, A-Priori ... Clustering Part III --- Stream ...
Read more

National Speech & Debate Association: Speech, Debate ...

National Speech & Debate Association. Toggle navigation. ... Everybody should be a part of this organization because PKD ... 2016 National Tournament ...
Read more

Data Mining: Practical Machine Learning Tools and ...

4.5 Mining Association Rules ... Part II: Advanced Data Mining 6. Implementations: ... Part III: The Weka Data Mining Workbench
Read more

ASME - The American Society of Mechanical Engineers

ASME (American Society of ... advanced manufacturing will play an important role in improving performance and reducing costs. ... Featured Topics ...
Read more

Paper Writing Service You Can Trust - EssayErudite.com ...

CUSTOM WRITING SERVICE. The web's leading provider of quality and professional academic writing. ... which remains a huge part of success in writing an essay.
Read more