50 %
50 %
Information about AssocSeq

Published on February 5, 2008

Author: Tito1


Association Rule and Sequential Pattern Mining for Episode Extraction:  Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip Introduction to Association Rule:  Introduction to Association Rule Associating multiple objects/events together Example: A customer buying a laptop also buys a wireless LAN card (2- itemset) Wireless LAN Card Laptop Laptop  Wireless LAN Card Association Rule (con’t):  Association Rule (con’t) Measures of Rule Interestingness Support == P(Laptop ∪ LAN card) Probability that all studied sets occur Confidence == P(LAN card ∣Laptop) =P(Laptop U LAN card)/P(Laptop) Conditional Probability that a customer bought Laptop also bought Wireless LAN card Buy both Thresholds: Minimum Support: 25% Minimum Confidence: 30% [Support = 40%, Confidence = 60%] Laptop Wireless LAN Card Association Rule (eg.):  Association Rule (eg.) Min_Sup = 25% Min_Conf = 25% Milk  Eggs Support : P(Milk ∪ Eggs) = 3/5 = 60% Confidence : P (Eggs|Milk) = P(Milk U Eggs)/P(Milk) P(Milk) = 4/5 = 80% P(Eggs∣Milk)=60%/80% = 75% (75% Confidence that a customer buys milk also buys eggs) Types of Association :  Types of Association Boolean vs. Quantitative Single dimension vs. Multiple dimension Single level vs. Multiple level Analysis Example: 1.) Gender(X,”Male”) ^ Income(X,”>50K”) ^Age(X,”35…50”)  Buys (X, BMW Sedan) 2.) Income(X,,”>50K”)  Buys (X, BMW Sedan) 3.) Gender(X,”Male”) ^ Income(X,”>50K”) ^Age(X,”35…50”)  Buys (X, BMW 540i) Association Rule (DB Miner):  Association Rule (DB Miner) Apriori Algorithm:  Apriori Algorithm Purpose To mine frequent itemsets for boolean association rules Use prior knowledge to predict future values Has to be frequent (Support>Min_Sup) Anti-monotone concept If a set cannot pass a min_sup test, all supersets will fail as well Apriori Algorithm Psuedo-Code:  Apriori Algorithm Psuedo-Code Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk; Apriori Algorithm Procedures:  Apriori Algorithm Procedures Step 1 Scan & find support of each item (C1): Example revisited: 5 – itemset with 5 transactions Min_Sup = 25%  Min Support Count = 2 items Min_Conf = 25% Step 2 Compare with Min_Sup and eliminate (prune) I<Min_Sup (L1): Apriori Algorithm (con’t):  Apriori Algorithm (con’t) Step 3 Join (L1 L1) Repeated Step: Eliminate (prune) items<min_supPrune (C2): L1 set L1 set Slide11:  L2 set Join L2 L2 L2 set Compare with Min_Sup then eliminate (prune) items <Min_sup: Conclusion: Bread & Coke & Milk have strong correlation Coke & Milk & Eggs have strong correlation Apriori Algorithm (con’t) Sequential Pattern Mining:  Sequential Pattern Mining Introduction Mining of frequently occurring patterns related to time or other sequences Examples 70% of customers rent “Star Wars, then “Empire Strikes Back”, and then “Return of the Jedi Application Intrusion detection on computers Web access pattern Predict disease with sequence of symptoms Many other areas Star Wars Empire Strikes Back Return of the Jedi Sequential Pattern Mining (con’t):  Sequential Pattern Mining (con’t) Steps: Sort Phase Sort by Cust_ID, Transaction_ID Litemset Phase Find large itemsets Transform Phase Eliminates items < min_sup Sequence Phase Find desired sequences Maximal Phase Find the maximal sequences among set of large sequences Sequential Pattern Mining (con’t):  Sequential Pattern Mining (con’t) Example: Database sorted by Cust_ID & Transaction Time (Min_sup=25%) Organized format with Cust_ID: Sequential Pattern Mining (con’t):  Sequential Pattern Mining (con’t) Step 1: Sort (examples of several transaction): Conclusion: >25% Min_sup {(3) (9)} && {(3) (4,7)} Sequential Pattern Mining (con’t):  Sequential Pattern Mining (con’t) Data sequence of each customer: Sequences < min_support: {(1,2) (3)}, {(3)},{(4)},{(7)},{(9)}, {(3) (4)}, {(3) (7), {(4) (7)} Support > 25% {(3) (9)} {(3) (4 7)} The most right column implies customers buying patterns Step 2: Litemset phase Sequential Pattern Mining Algorithm:  Sequential Pattern Mining Algorithm Algorithm AprioriAll Count all large sequence, including those not maximal Pseudo-code: Ck: Candidate sequence of size k Lk : frequent or large sequence of size k  L1 = {large 1-sequence};  //result of litemset phase for (k = 2; Lk !=; k++) do begin      Ck = candidates generated from Lk-1;     for each customer sequence c in database do        Increment the count of all candidates in Ck            that are contained in c     end Answer=Maximal sequences in k Lk; AprioriSome Generates every candidate sequence, but skips counting some large sequences (Forward Phase). Then, discards candidates not maximal and counts remaining large sequences (Backward Phase). 鸞 Episode Extraction:  Episode Extraction A partially ordered collection of events occurring together Goal: To analyze sequence of events, and to discover recurrent episodes First finding small frequent episodes then progressively looking larger episodes Types of episodes Serial () – E occurs before F Parallel() – No constraints on relativelyorder of A & B Non-Serial/Non-Parallel () - Occurrence of A & B precedes C    Episode Extraction (con’t):  Episode Extraction (con’t) S = {(A1,t1),(A2,t2),….,(An, tn)  s={(E,31),(D,32),(F,33)….(A,65)} Time window is set to bind the interestingness W(s,5) slides and snapshot the whole sequence eg. (w,35,40) contains A,B,C,E episodes ,  occurs but not  User specifies how many windows an episode has to occur to be frequent Formula : A Sequence of events: Episode Extraction:  Episode Extraction Minimal occurrences Look at exact occurrences of episodes & relationships between occurrences Can modify width of window Eliminates unnecessary repetition of the recognition effort Example mo() = {[35,38), [46,48),[57,60)} When episode is a subepisode of another; this relation is used for discovering all frequent episodes Applications of Episodes Extraction:  Applications of Episodes Extraction Computer Security Bioinformatics Finance Market Analysis And more…… References:  References Discovery of Frequent Episodes in Event Sequences (Manilla,Toivonen, Verkamo) Mining Sequential Patterns (Agrawal, Srikant) Principles of Data Mining (Hand, Manilla, Smyth) 2001 Data Mining Concepts and Techniques (Han, Kamber) 2001 Slide23:  END

Add a comment

Related presentations

Related pages

Class: SeqCanvas - University of California, San Francisco

assocSeq ; assocSeq ( self, aseq, eventType='', function="", ) alignment sequence has gained or lost associated structure bboxList ...
Read more

The MPL Reference Manual: at - 1.57.0 - Boost C++ Libraries

Parameter Requirement Description; Sequence: Forward Sequence: A sequence to be examined. AssocSeq: Associative Sequence: A sequence to be examined. N ...
Read more

Remove old Vector test file · cartazio/hs-cblas@a30fec4 ...

hs-cblas - Haskell BLAS bindings ... You can clone with HTTPS
Read more

The MPL Reference Manual: erase_key - 1.59.0

AssocSeq: Extensible Associative Sequence: A sequence to erase elements from. Key: Any type: A key for the elements to be removed. Expression semantics.
Read more

BoostMplRoadmap – Boost C++ Libraries

at form is not supported (contrary to the docs, ...
Read more

Ppt Serial-extraction | Powerpoint Presentations and ...

Source : E-Learning: Case Studies in Web-Controlled Devices and Remote ... PPT. Presentation Summary :...
Read more

Boost users' mailing page: By Thread

[Boost-users] Re: at Aleksey Gurtovoy (2005-03-01 05:22:46) [Boost-users] special_functions_test Gennadiy Rozental (2005-02-24 14:35:43)
Read more

Boost users' mailing page: By Author

[Boost-users] Re: at (2005-03-01 05:22:46) Alexis H. Rivera-Rios [Boost-users] regex expressions in boost::regex vs python, ...
Read more