Lecture Notes for Chapter 6 Slides by Tan Association Rule Mining • Given a set of -Use efficient data structures to store the candidates or transactions
Previous PDF | Next PDF |
[PDF] Data Mining Association Analysis: Basic Concepts and Algorithms
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
[PDF] Data Mining Association Analysis - DidaWiki
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
[PDF] 15097 Lecture 1: Rule mining and the Apriori algorithm
MIT 15 097 Course Notes Cynthia Rudin how doesn't appear in most data mining textbooks or courses Start with We can use Apriori's result to get all strong rules a → b as follows: Union them (lexicographically) to get C k , e g ,{ a, b, c
[PDF] Mining Association Rules
What Is Association Rule Mining? ▫ Basket data analysis, cross-marketing, catalog design, loss-leader Note that A -> B can be rewritten as ¬(A,¬B) ▫ ( http://www liacc up pt/~amjorge/Aulas/madsad/ecd2/ecd2_Aulas_AR_3_2003 pdf )
[PDF] UNIT IV ASSOCIATION RULE MINING AND - cloudfrontnet
Also Read Example problems which we solved in Class Lecture data mining systems should provide capabilities for mining association rules at multiple levels of abstraction, Note that database attributes can be categorical or quantitative
[PDF] INTRODUCTION TO DATA MINING ASSOCIATION RULES
Data, Course Notes by O Zaïane ○ Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, by I H Witten and E Frank
[PDF] Intro to Datamining & Machine Learning - NUS Computing
2003 http://www adrem ua ac be/~goethals/publications/pubs/fpm_survey pdf – Karl Aberer “Data mining: A short intro (Association rules)”, lecture notes, 2008
[PDF] Association Analysis: Basic Concepts and Algorithms Lecture Notes
Lecture Notes for Chapter 6 Slides by Tan Association Rule Mining • Given a set of -Use efficient data structures to store the candidates or transactions
[PDF] Mining Association Rule - Department of Computer Science
important data mining applications is that of mining association rules Of course the contrapositive of this statement (If X is a large itemset than so is any Note that we use superscript to denote the processor number, while subscript the size
[PDF] association rules in data mining tutorial
[PDF] association rules in data mining tutorial point
[PDF] assume directive in 8086
[PDF] assumptions of linear programming ppt
[PDF] assumptions of linear programming problem
[PDF] assumptions of linear programming slideshare
[PDF] assumptions of linear programming with examples
[PDF] assurance accident de travail france
[PDF] assurance étudiant étranger
[PDF] assurance qualité pharmaceutique et biotechnologique emploi
[PDF] ast manulife
[PDF] ast shares
[PDF] ast transfer shares
[PDF] ast2 apple
Data Mining
Association Analysis: Basic Concepts
and AlgorithmsLecture Notes for Chapter 6
Introduction to Data Mining
byTan, Steinbach, Kumar© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
Association Rule Mining
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transactionMarket-Basket transactionsTID Items
1 Bread, Milk 2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk},Implication means co-occurrence,
not causality! © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3Definition: Frequent Itemset
Itemset
-A collection of one or more itemsExample: {Milk, Bread, Diaper}
-k-itemsetAn itemset that contains k items
Support count ()
-Frequency of occurrence of an itemset -E.g. ({Milk, Bread,Diaper}) = 2Support
-Fraction of transactions that contain an itemset -E.g. s({Milk, Bread, Diaper}) = 2/5Frequent Itemset
-An itemset whose support is greater than or equal to a minsupthresholdTID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4Definition: Association Rule
Example:
Beer}Diaper,Milk{
4.052 |T|)Bee rDiaper,,Milk(V s67.032
)Diaper,Milk()Bee rDiaper,Milk,( V cAssociation Rule
-An implication expression of the formX Y, where X and Y are itemsets
-Example: {Milk, Diaper} {Beer}Rule Evaluation Metrics
-Support (s)Fraction of transactions that contain
both X and Y -Confidence (c)Measures how often items in Y
appear in transactions that contain XTID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5Association Rule Mining Task
Given a set of transactions T, the goal of
association rule mining is to find all rules having -support minsupthreshold -confidence minconfthresholdBrute-force approach:
-List all possible association rules -Compute the support and confidence for each rule -Prune rules that fail the minsupand minconf thresholdsComputationally prohibitive!
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6Mining Association Rules
Example of Rules:
{Milk,Diaper} {Beer} (s=0.4, c=0.67) {Milk,Beer} {Diaper} (s=0.4, c=1.0) {Diaper,Beer} {Milk} (s=0.4, c=0.67) {Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5)TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Observations:
• All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7Mining Association Rules
Two-step approach:
1.Frequent Itemset Generation
-Generate all itemsets whose support minsup2.Rule Generation
-Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemsetFrequent itemset generation is still
computationally expensive © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8Frequent Itemset Generation
nullABACADAEBCBDBECDCEDE
ABCDEABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDEGiven d items, there
are 2 d possible candidate itemsets © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9Frequent Itemset Generation
Brute-force approach:
-Each itemset in the lattice is a candidatefrequent itemset -Count the support of each candidate by scanning the database -Match each transaction against every candidate -Complexity ~ O(NMw) => Expensive since M = 2 dTID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
NTransactions
List of
Candidates
M w © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10Computational Complexity
Given d unique items:
-Total number of itemsets = 2 d -Total number of possible association rules: 12311 11 ddd kkd j jkd kdR
If d=6, R = 602 rules
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11Frequent Itemset Generation Strategies
Reduce the number of candidates(M)
-Complete search: M=2 d -Use pruning techniques to reduce MReduce the number of transactions (N)
-Reduce size of N as the size of itemset increases -Used by DHP and vertical-based mining algorithmsReduce the number of comparisons(NM)
-Use efficient data structures to store the candidates or transactions -No need to match every candidate against every transaction © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12Reducing Number of Candidates
Apriori principle:
-If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due to the following property of the support measure: -Support of an itemset never exceeds the support of its subsets -This is known as the anti-monotoneproperty of support )()()(:,YsXsYXYX © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13Found to be
Infrequent
nullABACADAEBCBDBECDCEDE
ABCDEABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDEIllustrating Apriori Principle
nullABACADAEBCBDBECDCEDE
ABCDEABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDEPruned
supersets © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14Illustrating Apriori Principle
ItemCount
Bread4
Coke2 Milk4 Beer3Diaper4
Eggs1ItemsetCount
{Bread,Milk}3 {Bread,Beer}2 {Bread,Diaper}3 {Milk,Beer}2 {Milk,Diaper}3 {Beer,Diaper}3Itemset Count
{Bread,Milk,Diaper} 3Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate candidates involving Coke or Eggs)Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,
6 C 1 6 C 2 6 C 3 = 41With support-based pruning,
6 + 6 + 1 = 13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15Apriori Algorithm
Method:
-Let k=1 -Generate frequent itemsets of length 1 -Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16Reducing Number of Comparisons
Candidate counting:
-Scan the database of transactions to determine the support of each candidate itemset -To reduce the number of comparisons, store the candidates in a hash structure Instead of matching each transaction against every candidate, match it against candidates contained in the hashed bucketsTID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
NTransactionsHash Structure
kBuckets
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17Generate Hash Tree
2 3 4 5 6 7 1 4 5 1 3 6 1 2 4 4 5 7 1 2 54 5 81 5 9
3 4 53 5 63 5 76 8 9
3 6 73 6 81,4,7
2,5,83,6,9Hash function
Suppose you have 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}You need:
•Hash function •Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node) © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18Association Rule Discovery: Hash tree
15 914513 6
3 453 6 7
3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 12 4 45 712 5 45 8
1,4,7
2,5,83,6,9
Hash FunctionCandidate Hash Tree
Hash on
1, 4 or 7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19Association Rule Discovery: Hash tree
1 591 4 51 3 6
3 4 53 6 7
3 6 8 3 56 3 57 6 89 23 456 7
1 24 4 57 1 25 4 58 1,4,7
2,5,83,6,9
Hash FunctionCandidate Hash Tree
Hash on
2, 5 or 8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20Association Rule Discovery: Hash tree
1 5 91 4 51 3 6
34 5367
36835 6
35 7
68 9
2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7