The Apriori algorithm - often called the “first thing data miners try,” but some- how doesn't appear in most data mining textbooks or courses Start with market basket
Previous PDF | Next PDF |
[PDF] Data Mining: Overview What is Data Mining?
Recently coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases
[PDF] 15564 Information Technology I Business Intelligence Outline
What is Data Mining? • Combination of AI and statistical analysis to discover information that is “hidden” in the data
[PDF] 15097 Project Suggestions - MIT OpenCourseWare
There are a variety of data mining competitions Try to obtain their data and come up with solutions to their tasks Possible competitions include the following
[PDF] Multiple Linear Regression in Data Mining
Data Mining Contents 2 1 A Review of Multiple Linear Regression 2 2 Illustration of the Regression Process 2 3 Subset Selection in Linear Regression 1
[PDF] Rule mining and the Apriori algorithm - MIT OpenCourseWare
The Apriori algorithm - often called the “first thing data miners try,” but some- how doesn't appear in most data mining textbooks or courses Start with market basket
[PDF] View PDF - Department of Computer Science, CUSAT
Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw mit edu 2 Probabilistic Techniques in Algorithms and Data Analysis, 2e, Cambridge
[PDF] Artificial Neural Networks
the emergence of Artificial Neural Networks as a major paradigm for Data Mining applications Neural nets have gone through two major development
[PDF] mitratel
[PDF] mixed expressive receptive language processing disorder
[PDF] mixtures and solutions lab activities
[PDF] mixtures and solutions module investigation 3 concentration
[PDF] mixtures and solutions website
[PDF] mixtures and solutions worksheet
[PDF] ml engineering
[PDF] ml tests
[PDF] mobile app business plan doc
[PDF] mobile app design plan template
[PDF] mobile app development project plan
[PDF] mobile app development project plan template
[PDF] mobile app development proposal ppt
[PDF] mobile app documentation template
Rule Mining and the Apriori Algorithm
MIT 15.097 Course Notes
Cynthia Rudin
The Apriori algorithm - often called the \rst thing data miners try," but some- how doesn't appear in most data mining textbooks or courses!Start with market basket data:
Some important denitions:
Itemset: a subset of items, e.g., (bananas, cherries, elderberries), indexed byf2;3;5g. Supportof an itemset: number of transactions containing it, mSupp(bananas, cherries, elderberries) =XM
i; 2Mi;3 i=1Mi;5: Condenceof rulea!b: the fraction of times itemsetbis purchased when itemsetais purchased. Supp( a[b)#timesaandbare purchasedConf(a!b) ==Supp(a) #timesais purchased ^=P(bja):1 We want to nd allstrong rules. These are rulesa!bsuch that: Supp( a[b);and Conf(a!b)minconf:Hereis called theminimum support threshold.
The support has a monotonicity property calleddownward closure:If Supp(
a[b)then Supp(a)and Supp(b). That is, ifa[bis a frequent item set, then so areaandb. Supp( a[b) = #timesaandbare purchased #timesais purchased = Supp(a):Apriori nds all frequent itemsets (
asuch that Supp(a)). We can use Apriori's result to get all strong rulesa!bas follows:For each frequent itemset`:
{Find all nonempty subsets of` {For each subseta, outputa! f`nagwhenever Supp( Supp( a )minconf: Now for Apriori. Use the downward closure property: generate allk-itemsets (itemsets of sizek) from (k1)-itemsets. It's a breadth-rst-search.2Example:
= 10erriesapples bananas cherries elderb esgrap1-itemsets: a b c d e /f g
supp: 25 20 30 45 29 5 17 \b \b2-itemsets:\b\bf g f\ba,b a,cg f g fa,eg\ba,d ...f\b\be,gg supp: 7 25 15 23 33-itemsets:fa,c,dg fa,c,eg fb,d,gg...
supp: 15 22 154-itemsets:fa,c,d,eg
supp: 12Apriori Algorithm:
Input: MatrixM
L1=ffrequent 1-itemsets;isuch that Supp(i)g.
Fork= 2, whileLk1=;(while there are largek1-itemsets),k+ + Ck= apriori gen(Lk1) generate candidate itemsets of sizek Lk=fc:c2Ck;Supp(c)gfrequent itemsets of sizek(loop over transactions, scan the database) endOutput:
kLk.6 S 3The subroutine apriori gen joinsLk1toLk1.
apriori genSubroutine:
Input:Lk1
Find all pairs of itemsets inLk1where the rstk2 items are identical. too bigUnion them (lexicographically) to getCk, e.g., f a;b;c;d;e;fg;fa;b;c;d;e;gg ! fa;b;c;d;e;f;gg Prune:Ck=ftoo bigc2Ck;all (k1)-subsetscsofcobeycs2Lk1g.Output:Ck.
Example of Prune step: considerfa;b;c;d;e;f;ggtoo bigwhich is inCk, and I want to know whether it's inCk. Look atfa;b;c;d;e;f;gg,fa;b;c;d;e;f;gg,fa;b;c;d;e;f;gg, f a;b;c; d;e;f;gg, etc. If any are not inL6, then prunefa;b;c;d;e;f;ggfromL7.4The Apriori Algorithm
1001 3 4
2 3 51 2 3 5
2 5 200300
400