[PDF] [PDF] Rule mining and the Apriori algorithm - MIT OpenCourseWare

The Apriori algorithm - often called the “first thing data miners try,” but some- how doesn't appear in most data mining textbooks or courses Start with market basket  



Previous PDF Next PDF





[PDF] Data Mining: Overview What is Data Mining?

Recently coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases



[PDF] 15564 Information Technology I Business Intelligence Outline

What is Data Mining? • Combination of AI and statistical analysis to discover information that is “hidden” in the data



[PDF] 15097 Project Suggestions - MIT OpenCourseWare

There are a variety of data mining competitions Try to obtain their data and come up with solutions to their tasks Possible competitions include the following 



[PDF] Multiple Linear Regression in Data Mining

Data Mining Contents 2 1 A Review of Multiple Linear Regression 2 2 Illustration of the Regression Process 2 3 Subset Selection in Linear Regression 1 



[PDF] Rule mining and the Apriori algorithm - MIT OpenCourseWare

The Apriori algorithm - often called the “first thing data miners try,” but some- how doesn't appear in most data mining textbooks or courses Start with market basket  



[PDF] View PDF - Department of Computer Science, CUSAT

Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw mit edu 2 Probabilistic Techniques in Algorithms and Data Analysis, 2e, Cambridge 



[PDF] Artificial Neural Networks

the emergence of Artificial Neural Networks as a major paradigm for Data Mining applications Neural nets have gone through two major development

[PDF] mit starting salary by major

[PDF] mitratel

[PDF] mixed expressive receptive language processing disorder

[PDF] mixtures and solutions lab activities

[PDF] mixtures and solutions module investigation 3 concentration

[PDF] mixtures and solutions website

[PDF] mixtures and solutions worksheet

[PDF] ml engineering

[PDF] ml tests

[PDF] mobile app business plan doc

[PDF] mobile app design plan template

[PDF] mobile app development project plan

[PDF] mobile app development project plan template

[PDF] mobile app development proposal ppt

[PDF] mobile app documentation template

Rule Mining and the Apriori Algorithm

MIT 15.097 Course Notes

Cynthia Rudin

The Apriori algorithm - often called the \rst thing data miners try," but some- how doesn't appear in most data mining textbooks or courses!

Start with market basket data:

Some important denitions:

Itemset: a subset of items, e.g., (bananas, cherries, elderberries), indexed byf2;3;5g. Supportof an itemset: number of transactions containing it, m

Supp(bananas, cherries, elderberries) =XM

i; 2Mi;3 i=1Mi;5: Condenceof rulea!b: the fraction of times itemsetbis purchased when itemsetais purchased. Supp( a[b)#timesaandbare purchasedConf(a!b) ==Supp(a) #timesais purchased ^=P(bja):1 We want to nd allstrong rules. These are rulesa!bsuch that: Supp( a[b);and Conf(a!b)minconf:

Hereis called theminimum support threshold.

The support has a monotonicity property calleddownward closure:

If Supp(

a[b)then Supp(a)and Supp(b). That is, ifa[bis a frequent item set, then so areaandb. Supp( a[b) = #timesaandbare purchased #timesais purchased = Supp(a):

Apriori nds all frequent itemsets (

asuch that Supp(a)). We can use Apriori's result to get all strong rulesa!bas follows:

For each frequent itemset`:

{Find all nonempty subsets of` {For each subseta, outputa! f`nagwhenever Supp( Supp( a )minconf: Now for Apriori. Use the downward closure property: generate allk-itemsets (itemsets of sizek) from (k1)-itemsets. It's a breadth-rst-search.2

Example:

= 10erriesapples bananas cherries elderb esgrap

1-itemsets: a b c d e /f g

supp: 25 20 30 45 29 5 17 \b \b2-itemsets:\b\bf g f\ba,b a,cg f g fa,eg\ba,d ...f\b\be,gg supp: 7 25 15 23 3

3-itemsets:fa,c,dg fa,c,eg fb,d,gg...

supp: 15 22 15

4-itemsets:fa,c,d,eg

supp: 12

Apriori Algorithm:

Input: MatrixM

L

1=ffrequent 1-itemsets;isuch that Supp(i)g.

Fork= 2, whileLk1=;(while there are largek1-itemsets),k+ + Ck= apriori gen(Lk1) generate candidate itemsets of sizek Lk=fc:c2Ck;Supp(c)gfrequent itemsets of sizek(loop over transactions, scan the database) end

Output:

kLk.6 S 3

The subroutine apriori gen joinsLk1toLk1.

apriori gen

Subroutine:

Input:Lk1

Find all pairs of itemsets inLk1where the rstk2 items are identical. too bigUnion them (lexicographically) to getCk, e.g., f a;b;c;d;e;fg;fa;b;c;d;e;gg ! fa;b;c;d;e;f;gg Prune:Ck=ftoo bigc2Ck;all (k1)-subsetscsofcobeycs2Lk1g.

Output:Ck.

Example of Prune step: considerfa;b;c;d;e;f;ggtoo bigwhich is inCk, and I want to know whether it's inCk. Look atfa;b;c;d;e;f;gg,fa;b;c;d;e;f;gg,fa;b;c;d;e;f;gg, f a;b;c; d;e;f;gg, etc. If any are not inL6, then prunefa;b;c;d;e;f;ggfromL7.4

The Apriori Algorithm

1001 3 4

2 3 5

1 2 3 5

2 5 200
300
400

TIDItems

Database D

Item Set Sup.

Scan D

Scan D

Scan D

C 1 L 1 {1} 2 3 3 1 3{2} {3} {4} {5}

Item Set Sup.C

2 {1 2} 1 2 1 2

3{1 3}

{1 5} {2 3} {2 5}

2{3 5}

Item Set Sup.

{1} 2 3 3

3{2}{3}

{5}

Item Set Sup.

{1 3} 2 2 3

2{2 3}{2 5}

{3 5}

Item Set Sup.

{2 3 5}2

Item Set

{2 3 5} C 2 L 2 C 3 L 3

Item Set

{1 2} {1 3} {1 5} {2 3} {2 5} {3 5} Note : {1,2,3} {1,2,5} and {1,3,5} not in C3. Apriori scans the database at most how many times?

Huge number of candidate sets.

Spa/ wned huge number of apriori-like papers. What do you do with the rules after they're generated?

Information overload (give up)

Order rules by \interestingness"

{CondenceSupp(a^P(bja) =[b) Supp( a {\Lift"/\Interest" ^P(bja) Supp(b)=^P(b)Supp(a b1[) Supp( a {Hundreds!

Research questions:

mining more than just itemsets (e.g, sequences, trees, graphs) incorporating taxonomy in items boolean logic and \logical analysis of data" Cynthia's questions: Can we use rules within ML to get good predictive models?5

KWWSRFZPLWHGX

KWWSRFZPLWHGXWHUPV

quotesdbs_dbs17.pdfusesText_23