Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
Previous PDF | Next PDF |
[PDF] Data Mining Association Analysis: Basic Concepts and Algorithms
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
[PDF] Data Mining Association Analysis - DidaWiki
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
[PDF] 15097 Lecture 1: Rule mining and the Apriori algorithm
MIT 15 097 Course Notes Cynthia Rudin how doesn't appear in most data mining textbooks or courses Start with We can use Apriori's result to get all strong rules a → b as follows: Union them (lexicographically) to get C k , e g ,{ a, b, c
[PDF] Mining Association Rules
What Is Association Rule Mining? ▫ Basket data analysis, cross-marketing, catalog design, loss-leader Note that A -> B can be rewritten as ¬(A,¬B) ▫ ( http://www liacc up pt/~amjorge/Aulas/madsad/ecd2/ecd2_Aulas_AR_3_2003 pdf )
[PDF] UNIT IV ASSOCIATION RULE MINING AND - cloudfrontnet
Also Read Example problems which we solved in Class Lecture data mining systems should provide capabilities for mining association rules at multiple levels of abstraction, Note that database attributes can be categorical or quantitative
[PDF] INTRODUCTION TO DATA MINING ASSOCIATION RULES
Data, Course Notes by O Zaïane ○ Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, by I H Witten and E Frank
[PDF] Intro to Datamining & Machine Learning - NUS Computing
2003 http://www adrem ua ac be/~goethals/publications/pubs/fpm_survey pdf – Karl Aberer “Data mining: A short intro (Association rules)”, lecture notes, 2008
[PDF] Association Analysis: Basic Concepts and Algorithms Lecture Notes
Lecture Notes for Chapter 6 Slides by Tan Association Rule Mining • Given a set of -Use efficient data structures to store the candidates or transactions
[PDF] Mining Association Rule - Department of Computer Science
important data mining applications is that of mining association rules Of course the contrapositive of this statement (If X is a large itemset than so is any Note that we use superscript to denote the processor number, while subscript the size
[PDF] association rules in data mining tutorial
[PDF] association rules in data mining tutorial point
[PDF] assume directive in 8086
[PDF] assumptions of linear programming ppt
[PDF] assumptions of linear programming problem
[PDF] assumptions of linear programming slideshare
[PDF] assumptions of linear programming with examples
[PDF] assurance accident de travail france
[PDF] assurance étudiant étranger
[PDF] assurance qualité pharmaceutique et biotechnologique emploi
[PDF] ast manulife
[PDF] ast shares
[PDF] ast transfer shares
[PDF] ast2 apple
Data Mining
Association Analysis: Basic Concepts
and AlgorithmsLecture Notes for Chapter 6
Introduction to Data Mining
byTan, Steinbach, Kumar
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2Association Rule Mining?Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items in the transactionMarket-Basket transactionsTID
Items 1Bread, Milk
2Bread, Diaper, Beer, Eggs
3Milk, Diaper, Beer, Coke
4Bread, Milk, Diaper, Beer
5Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} →{Beer}, {Milk, Bread} →{Eggs,Coke}, {Beer, Bread} →{Milk},Implication means co-occurrence,
not causality!© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3Definition: Frequent Itemset?
Itemset-
A collection of one or more items
Example: {Milk, Bread, Diaper}
k-itemsetAn itemset that contains k items
Support count (σσσσ)-
Frequency of occurrence of an itemset
E.g. σ({Milk, Bread,Diaper}) = 2
Support-
Fraction of transactions that contain an
itemsetE.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset-
An itemset whose support is greater
than or equal to a minsupthreshold TID Items 1Bread, Milk
2Bread, Diaper, Beer, Eggs
3Milk, Diaper, Beer, Coke
4Bread, Milk, Diaper, Beer
5Bread, Milk, Diaper, Coke
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4Definition: Association Rule
Example:
Beer}Diaper,Milk{
4.05 2 |T| BeerDiaper,,
Milk( s 67.032 )Diaper,Milk( Beer
Diaper,
Milk,(
cAssociation Rule-
An implication expression of the form
X →Y, where X and Y are itemsets
Example:
{Milk, Diaper} →{Beer}Rule Evaluation Metrics-
Support (s)
Fraction of transactions that contain
both X and YConfidence (c)
Measures how often items in Y
appear in transactions that contain X TID Items 1Bread, Milk
2Bread, Diaper, Beer, Eggs
3Milk, Diaper, Beer, Coke
4Bread, Milk, Diaper, Beer
5Bread, Milk, Diaper, Coke
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5Association Rule Mining Task?Given a set of transactions T, the goal of association rule mining is to find all rules having -
support ≥minsupthreshold confidence ≥minconfthreshold ?Brute-force approach:-List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsupand minconf
thresholds ?Computationally prohibitive© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6Mining Association Rules
Example of Rules:{Milk,Diaper} →{Beer} (s=0.4, c=0.67) {Milk,Beer} →{Diaper} (s=0.4, c=1.0) {Diaper,Beer} →{Milk} (s=0.4, c=0.67) {Beer} →{Milk,Diaper} (s=0.4, c=0.67) {Diaper} →{Milk,Beer} (s=0.4, c=0.5) {Milk} →{Diaper,Beer} (s=0.4, c=0.5) TID Items 1Bread, Milk
2Bread, Diaper, Beer, Eggs
3Milk, Diaper, Beer, Coke
4Bread, Milk, Diaper, Beer
5Bread, Milk, Diaper, Coke
Observations:• All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7Mining Association Rules?
Two-step approach: 1.
Frequent Itemset Generation-
Generate all itemsets whose support ≥minsup
2.Rule Generation-
Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemsetFrequent itemset generation is still
computationally expensive© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8Frequent Itemset Generation
null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDEGiven d items, there
are 2dpossible candidate itemsets© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9Frequent Itemset Generation?Brute-force approach: -
Each itemset in the lattice is a
candidate frequent itemset Count the support of each candidate by scanning the databaseMatch each transaction against every candidate
Complexity ~ O(NMw) =>
Expensive since M = 2d
TID Items 1Bread, Milk
2Bread, Diaper, Beer, Eggs
3Milk, Diaper, Beer, Coke
4Bread, Milk, Diaper, Beer
5Bread, Milk, Diaper, Coke
Transactions
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10Computational Complexity?Given d unique items:-
Total number of itemsets = 2d
Total number of possible association rules:
1 2 3 11 1 1+ ddd kkd j jkd kdRIf d=6, R = 602 rules
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11Frequent Itemset Generation Strategies?Reduce the
number of candidates (M)Complete search: M=2d
Use pruning techniques to reduce M
?Reduce the number of transactions (N)Reduce size of N as the size of itemset increases
Used by DHP and vertical-based mining algorithms
?Reduce the number of comparisons (NM) Use efficient data structures to store the candidates or transactionsNo need to match every candidate against every
transaction© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12Reducing Number of Candidates?Apriori principle
If an itemset is frequent, then all of its subsets must also be frequent ?Apriori principle holds due to the following property of the support measure:- Support of an itemset never exceeds the support of its subsetsThis is known as the
anti-monotone property of support)()()(:,YsXsYXYX≥© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13Found to be Infrequent
Illustrating Apriori Principle
Pruned
supersets© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14Illustrating Apriori PrincipleItem
Count Bread 4 Coke 2 Milk 4 Beer 3Diaper
4 Eggs 1Itemset
Count {Bread,Milk} 3 {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3Item set
C ount
{B read,M ilk,D iaper} 3Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate candidates involving Coke or Eggs)Triplets (3-itemsets)
Minimum Support = 3If every subset is considered,
6C1+ 6C2+ 6C3= 41
With support-based pruning,
6 + 6 + 1 = 13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15Apriori Algorithm?Method: -Let k=1-Generate frequent itemsets of length 1-Repeat until no new frequent itemsets are identified?Generate length (k+1) candidate itemsets from length k
frequent itemsets ?Prune candidate itemsets containing subsets of length k that are infrequent ?Count the support of each candidate by scanning the DB ?Eliminate candidates that are infrequent, leaving only those that are frequent© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16Reducing Number of Comparisons?Candidate counting:-
Scan the database of transactions to determine the support of each candidate itemsetTo reduce the number of comparisons, store the
candidates in a hash structure?Instead of matching each transaction against every candidate, match it against candidates contained in the hashed bucketsTID Items 1Bread, Milk
2Bread, Diaper, Beer, Eggs
3Milk, Diaper, Beer, Coke
4Bread, Milk, Diaper, Beer
5Bread, Milk, Diaper, Coke
Transactions
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17Generate Hash Tree
2 3 4 5 6 7 1 4 5 1 3 6 1 2 4 4 5 7 1 2 5 4 5 8 1 5 9 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8 1,4,7 2,5,8 3,6,9Hash function
Suppose you have 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}You need:
•Hash function •Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node)© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18Association Rule Discovery: Hash tree
15 9 1 4 5 13 6 3 453 6 73 6 8
3 5 63 5 76 8 9
2 3 45 6 7
12 4 45 712 545 8
1,4,72,5,83,6,9Hash Function
Candidate Hash Tree
Hash on
1, 4 or 7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19Association Rule Discovery: Hash tree
1 59 1 4 5 1 3 6 3 4 53 6 73 6 8
3 56 3 57 6 8923 456 7
1 24 4 571 2 54 5 8
1,4,7 2,5,83,6,9Hash Function
Candidate Hash Tree
Hash on
2, 5 or 8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20Association Rule Discovery: Hash tree
1 5 9 1 4 5 1 3 6 34 53 6 7 3 6 8
35 635 768 9
2 3 45 6 7
1 2 44 5 7
1 2 54 5 8
1,4,7 2,5,8 3,6,9Hash FunctionCandidate Hash Tree
Hash on
3, 6 or 9
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21Subset Operation
Given a transaction t, what are the possible subsets of size 3?© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22Subset Operation Using Hash Tree
1 5 9 1 4 5 1 3 6 3 4 53 6 73 6 8
3 5 63 5 76 8 9
2 3 45 6 7
1 2 44 5 7
1 2 54 5 8
1 2 3 5 6
1 +2 3 5 6
3 5 6 2 + 5 63 +1,4,7
2,5,83,6,9Hash Function
transaction© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23Subset Operation Using Hash Tree
1 5 9 1 4 5 1 3 6 3 4 53 6 73 6 8
3 5 63 5 76 8 9
2 3 45 6 7
1 2 44 5 7
1 2 54 5 8
1,4,72,5,83,6,9Hash Function
1 2 3 5 6
3 5 6 1 2 + 5 6 1 3 + 6 1 5 + 3 5 6 2 + 5 6 3 + 1 +2 3 5 6transaction
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24Subset Operation Using Hash Tree
1 5 9 1 4 5 1 3 6 3 4 53 6 73 6 8
3 5 63 5 76 8 9
2 3 45 6 7
1 2 44 5 7
1 2 54 5 8
1,4,72,5,83,6,9Hash Function
1 2 3 5 6
3 5 6 1 2 + 5 6 1 3 + 6 1 5 + 3 5 6 2 + 5 6 3 + 1 +2 3 5 6transaction
Match transaction against 11 out of 15 candidates
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25Factors Affecting Complexity?Choice of minimum support threshold-
lowering support threshold results in more frequent itemsets this may increase number of candidates and max length of frequent itemsets ?Dimensionality (number of items) of the data set- more space is needed to store support count of each item if number of frequent items also increases, both computation and I/O costs may also increase ?Size of database- since Apriori makes multiple passes, run time of algorithm may increase with number of transactions ?Average transaction width- transaction width increases with denser data setsThis may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26Compact Representation of Frequent Itemsets?Some itemsets are redundant because they have identical support as their supersets
?Number of frequent itemsetsquotesdbs_dbs17.pdfusesText_23