2003 http://www adrem ua ac be/~goethals/publications/pubs/fpm_survey pdf – Karl Aberer “Data mining: A short intro (Association rules)”, lecture notes, 2008
Previous PDF | Next PDF |
[PDF] Data Mining Association Analysis: Basic Concepts and Algorithms
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
[PDF] Data Mining Association Analysis - DidaWiki
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar association rule mining is to find all rules having – support ≥ minsup
[PDF] 15097 Lecture 1: Rule mining and the Apriori algorithm
MIT 15 097 Course Notes Cynthia Rudin how doesn't appear in most data mining textbooks or courses Start with We can use Apriori's result to get all strong rules a → b as follows: Union them (lexicographically) to get C k , e g ,{ a, b, c
[PDF] Mining Association Rules
What Is Association Rule Mining? ▫ Basket data analysis, cross-marketing, catalog design, loss-leader Note that A -> B can be rewritten as ¬(A,¬B) ▫ ( http://www liacc up pt/~amjorge/Aulas/madsad/ecd2/ecd2_Aulas_AR_3_2003 pdf )
[PDF] UNIT IV ASSOCIATION RULE MINING AND - cloudfrontnet
Also Read Example problems which we solved in Class Lecture data mining systems should provide capabilities for mining association rules at multiple levels of abstraction, Note that database attributes can be categorical or quantitative
[PDF] INTRODUCTION TO DATA MINING ASSOCIATION RULES
Data, Course Notes by O Zaïane ○ Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, by I H Witten and E Frank
[PDF] Intro to Datamining & Machine Learning - NUS Computing
2003 http://www adrem ua ac be/~goethals/publications/pubs/fpm_survey pdf – Karl Aberer “Data mining: A short intro (Association rules)”, lecture notes, 2008
[PDF] Association Analysis: Basic Concepts and Algorithms Lecture Notes
Lecture Notes for Chapter 6 Slides by Tan Association Rule Mining • Given a set of -Use efficient data structures to store the candidates or transactions
[PDF] Mining Association Rule - Department of Computer Science
important data mining applications is that of mining association rules Of course the contrapositive of this statement (If X is a large itemset than so is any Note that we use superscript to denote the processor number, while subscript the size
[PDF] association rules in data mining tutorial
[PDF] association rules in data mining tutorial point
[PDF] assume directive in 8086
[PDF] assumptions of linear programming ppt
[PDF] assumptions of linear programming problem
[PDF] assumptions of linear programming slideshare
[PDF] assumptions of linear programming with examples
[PDF] assurance accident de travail france
[PDF] assurance étudiant étranger
[PDF] assurance qualité pharmaceutique et biotechnologique emploi
[PDF] ast manulife
[PDF] ast shares
[PDF] ast transfer shares
[PDF] ast2 apple
CS4220: Knowledge Discovery Methods for Bioinformatics
Unit 1c: Essence of Knowledge Discovery
(Part C: Data Mining)Wong Limsoon
2 CS4220, AY2016/17 Copyright 2017 © Limsoon WongLecture Outline
Clustering, aka unsupervised learning
Association rule mining
Classification, aka supervised learning
Clustering
4 CS4220, AY2016/17 Copyright 2017 © Limsoon WongObjective of Cluster Analysis
Find groups of objects s.t. objects in a group areSimilar (or related) to one another
Diff from (or unrelated to) objects in other groupsInter-cluster
distances are maximizedIntra-cluster
distances are minimizedCohesive, compact Distinctive, apart
5 CS4220, AY2016/17 Copyright 2017 © Limsoon Wong can be ambiguousHow many clusters?
Four Clusters Two Clusters
Six Clusters
6 CS4220, AY2016/17 Copyright 2017 © Limsoon WongSupervised vs. Unsupervised Learning
Supervised learning (aka classification)
Training data (observations, measurements, etc.)
are accompanied by classNew data is classified based on training data
Unsupervised learning (aka clustering)
Class labels of training data are unknown
Given a set of measurements, observations, etc.,
aim to establish existence of classes in the data 7 CS4220, AY2016/17 Copyright 2017 © Limsoon WongTypical Clustering Techniques
Partitional clustering: K-means
Division of data objects into non-overlapping
subsets (clusters) s.t. each data object is in exactly one subsetHierarchical clustering: Agglomerative approach
A set of nested clusters organized as a
hierarchical treeSubspace clustering and bi-/co-clustering
Simultaneous clustering on a subset of tuples and
a subset of attributes 8 CS4220, AY2016/17 Copyright 2017 © Limsoon WongPartitional Clustering: K-Means
Each cluster has a centroid
Each point is assigned to a cluster based on
closest centroid # of clusters, K, must be specified 9 CS4220, AY2016/17 Copyright 2017 © Limsoon WongMore Details of K-Means Clustering
Initial centroids are often chosen randomly
Clusters produced vary from one run to another
Centroid
cosine similarity, correlation, etcK-means usually converges in a few iterations
Complexity is O(n * K * i * d)
n = # of points, K = # of clusters, i = # of iterations, d = # of attributes 10 CS4220, AY2016/17 Copyright 2017 © Limsoon WongExample Iterations by K-Means
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 1
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 2
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 3
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 4
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 5
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 6
11 CS4220, AY2016/17 Copyright 2017 © Limsoon WongTwo Different K-means Clusterings
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x y -2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x ySub-optimal Clustering
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yOptimal Clustering
Original Points
12 CS4220, AY2016/17 Copyright 2017 © Limsoon WongEvaluating K-means Clusters
Sum of Squared Error (SSE) is commonly used
Error of a point is its distance to nearest centroidSquare these errors and sum them to get SSE
Can reduce SSE by increasing K, the # of clusters
A good clustering with smaller K can have a
lower SSE than a poor clustering with higher K K iCx i i xmdistSSE 1 2),( where Ci is a cluster, mi is its centroid 13 CS4220, AY2016/17 Copyright 2017 © Limsoon WongImportance of Choosing Initial Centroids
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 1
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 2
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 3
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 4
-2-1.5-1-0.500.511.52 0 0.5 1 1.5 2 2.5 3 x yIteration 5
14 CS4220, AY2016/17 Copyright 2017 © Limsoon WongSolutions to Initial Centroid Problem
Multiple runs
Helps, but probability is not on your side
Use hierarchical clustering to determine initial
centroidsSelect >k initial centroids and then select the
most widely separated among these initial centroidsUse more advanced algos-
initialization issues 15 CS4220, AY2016/17 Copyright 2017 © Limsoon WongLimitations of K-means
Has problems
when clusters are of differing SizesDensities
Non-globular shapes
Also has problems
when data contain outliers 16 CS4220, AY2016/17 Copyright 2017 © Limsoon WongOvercoming
K-One solution is to
use many clustersFind parts of clusters
But need to put them
togetherDiffering Sizes
Differing Densities
Nonglobular Shapes
17 CS4220, AY2016/17 Copyright 2017 © Limsoon Wong Page 204Hierachical Clustering
Hierarchical clustering
Organize similar data into groups
Form groups into a hierarchical tree structure,
termed a DendrogramOffer useful visual descriptions of data
Two approaches
Agglomerative
Build the tree by finding most related objects firstDivisive
Build the tree by finding most dissimilar objects first. 18 CS4220, AY2016/17 Copyright 2017 © Limsoon WongDistance Matrix
Square, symmetrical
Element value is
based on a similarity function, e.g.,Euclidian distance
a Similarity Matrix or aProximity Matrix
19 CS4220, AY2016/17 Copyright 2017 © Limsoon WongAgglomerative Hierarchical Clustering
Basic algo is straightforward
Key is computing proximity of two clusters
Diff approaches to defining distance betw clusters distinguish the diff algosCompute proximity matrix
Let each data point be a cluster
Repeat
Merge the two closest clusters
Update the proximity matrix
Until only a single cluster remains
20 CS4220, AY2016/17 Copyright 2017 © Limsoon WongStarting Situation
Start with clusters
of individual points and a proximity matrix p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . Proximity Matrix p1p2p3p4p9p10p11p12 21CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Intermediate Situation
After some merging
steps, we have some clusters C1 C4 C2 C5 C3 p1p2p3p4p9p10p11p12 C2 C1 C1 C3 C5 C4 C2C3 C4 C5
Proximity Matrix
22CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Intermediate Situation
We want to merge
two closest clusters (C2, C5) and update the proximity matrix C1 C4 C2 C5 C3 C2 C1 C1 C3 C5 C4 C2C3 C4 C5
Proximity Matrix
p1p2p3p4p9p10p11p12 23CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Average
Linkage
Distance
Between
Centroids
Defining Inter-Cluster Similarity
Other methods use an
objective function squared errorSingle
Linkage
Complete
Linkage
24CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Finally, get a resulting dendrogram
25CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Strengths of Hierarchical Clustering
No need to assume any particular # of clusters
Any desired number of clusters can be obtained
dendogram at the proper levelThey may correspond to meaningful taxonomies
Example in biological sciences (e.g., animal
26CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Divisive Hierarchical Clustering
Start with one, all-inclusive cluster
At each step, split a cluster until each cluster
contains a point (or there are k clusters) 27CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
To build a MST (Minimum Spanning Tree)
Start with a tree that consists of any point
In successive steps, look for the closest pair of
points (p, q) s.t. p is in the current tree but q is notAdd q to the tree and put an edge betw p and q
28CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Subspace Clustering
Cluster boundaries clear only wrt the subspaces
Bi- or Co-Clustering
Simultaneous clustering on a subset of attributes
and a subset of tuples 29CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
High-Dimensional Data
Many applications need clustering on high-
dimensional dataText documents
Microarray data
Major challenges:
Many irrelevant dimensions may mask clusters
Distance measure becomes meaningless
equi-Clusters may exist only in some subspaces
30CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Curse of Dimensionality
Data in only one dimension is relatively packed
dimension, making them further apart Adding more dimensions makes the points further apartHigh-dimensional data is sparse
Distance measure becomes meaningless, as most data points become equi-distance to each other Image credit: Parsons et al. KDD Explorations, 2004 31CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
Why subspace
clustering?Clusters may
exist only in some subspacesSubspace-
clustering: find clusters in all the subspacesExercise: Which dimension
combinations are best for identifying which clusters? Image credit: Parsons et al. KDD Explorations, 2004 32CS4220, AY2016/17 Copyright 2017 © Limsoon Wong
However, inspect
your subspace clusters carefully!Image credit: Eamonn Keogh
A cloud of points in 3D
In 2D XZ
In 2D YZ
In 2D XY
33CS4220, AY2016/17 Copyright 2017 © Limsoon Wong