Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Motivation, definition, evaluation the dendrogram at the desired level, then each
Previous PDF | Next PDF |
[PDF] Hierarchical Clustering / Dendrograms
The agglomerative hierarchical clustering algorithms available in this In this example we can compare our interpretation with an actual plot of the data
[PDF] Chapter 7 Hierarchical cluster analysis
describing the algorithm, or set of instructions, which creates the dendrogram results In this chapter we demonstrate hierarchical clustering on a small example
[PDF] Hierarchical Agglomerative Clustering - Université Lumière Lyon 2
Hierarchical cluster analysis The cluster dendrogram is very important to Example #HAC - single linkage cah
[PDF] Hierarchical Clustering - Princeton University Computer Science
For example, we partition organisms into different species, but science has also Hierarchical clustering constructs a (usually binary) tree over the data The leaves are distance matrix that was used to compute the dendrogram Notice how
[PDF] Hierarchical clustering - CMU Statistics
29 jan 2013 · Given the linkage, hierarchical clustering produces a sequence of clustering Example of a dendrogram with no inversions 1 7 4 5 6 2 3
[PDF] CSE601 Hierarchical Clustering
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Motivation, definition, evaluation the dendrogram at the desired level, then each
[PDF] Hierarchical clustering for gene expression data analysis
clusters) 2 Clustering of samples (columns) => identification of sub-types of related Dendrogram 1 For agglomerative hierarchical clustering we start with
[PDF] Hierarchical Clustering - Introduction to Information Retrieval
An HAC clustering is typically visualized as a dendrogram as shown in DENDROGRAM We will see an example of an inversion in Figure 17 12 Hierarchical
[PDF] Automatic Extraction of Clusters from Hierarchical Clustering
dendrogram or reachability plot, guided by a visual inspection of the graphical representation The second Figure 1: Example dataset 1: hierarchical clustering
[PDF] hierarchical clustering elbow method python
[PDF] hierarchical clustering in r datanovia
[PDF] hierarchical clustering python scikit learn
[PDF] hierarchical clustering python scipy example
[PDF] hierarchical inheritance in java
[PDF] hierarchical network
[PDF] hierarchical network design pdf
[PDF] hierarchical regression table apa
[PDF] hierarchical structure journal article
[PDF] hierarchy java example
[PDF] hierarchy of law reports
[PDF] hifly a321
[PDF] hifly a380 interior
[PDF] hifly a380 model
Clustering
Lecture 3: Hierarchical Methods
Jing Gao
SUNY Buffalo
1Outline
Basics
Motivation, definition, evaluation
Methods
Partitional
Hierarchical
Density-based
Mixture model
Spectral methods
Advanced topics
Clustering ensemble
Clustering in MapReduce
Semi-supervised clustering, subspace clustering, co-clustering, etc. 2 3Hierarchical Clustering
Agglomerative approach
b d c e a a b d e c d e a b c d eStep 0 Step 1 Step 2 Step 3 Step 4 bottom-up
Initialization:
Each object is a cluster
Iteration:
Merge two clusters which are
most similar to each other;Until all objects are merged
into a single cluster 4Hierarchical Clustering
Divisive Approaches
b d c e a a b d e c d e a b c d eStep 4 Step 3 Step 2 Step 1 Step 0 Top-down
Initialization:
All objects stay in one cluster
Iteration:
Select a cluster and split it into
two sub clustersUntil each leaf cluster contains
only one object 5Dendrogram
A tree that shows how clusters are merged/split
hierarchically Each node on the tree is a cluster; each leaf node is a singleton cluster 6Dendrogram
A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a clusterAgglomerative Clustering Algorithm
More popular hierarchical clustering technique
Basic algorithm is straightforward
1.Compute the distance matrix
2.Let each data point be a cluster
3.Repeat
4. Merge the two closest clusters
5. Update the distance matrix
6.Until only a single cluster remains
Key operation is the computation of the distance between two clusters Different approaches to defining the distance between clusters distinguish the different algorithms 7Starting Situation
Start with clusters of individual points and a distance matrix p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . Distance Matrix p1p2p3p4p9p10p11p128Intermediate Situation
After some merging steps, we have some clusters
Choose two clusters that has the smallest
distance (largest similarity) to merge C1 C4 C2 C5 C3 C2 C1 C1 C3 C5 C4 C2C3 C4 C5
Distance Matrix
p1p2p3p4p9p10p11p12 9Intermediate Situation
We want to merge the two closest clusters (C2 and C5) and update the distance matrix. C1 C4 C2 C5 C3 C2 C1 C1 C3 C5 C4 C2C3 C4 C5
Distance Matrix
p1p2p3p4p9p10p11p12 10After Merging
C1 C4C2 U C5
C3 C2 U C5 C1 C1 C3 C4C2 U C5
C3 C4Distance Matrix
p1p2p3p4p9p10p11p12 11How to Define Inter-Cluster Distance
p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . .Distance?
YMIN YMAXYGroup Average
YDistance Between Centroids
Distance Matrix
12MIN or Single Link
Inter-cluster distance
The distance between two clusters is represented by the distance of the closest pair of data objects belonging to different clusters. Determined by one pair of points, i.e., by one link in the proximity graph ),(min),( ,minqpdCCd MINNested Clusters Dendrogram
1 2 3 4 5 6 1 2 3 4 53625410
0.05 0.1 0.15 0.2 14Strength of MIN
Original Points Two Clusters
ͻ Can handle non-elliptical shapes
15Limitations of MIN
Original Points
Two Clusters
ͻ Sensitive to noise and outliers
16MAX or Complete Link
Inter-cluster distance
The distance between two clusters is represented by the distance of the farthest pair of data objects belonging to different clusters ),(max),( ,minqpdCCd MAXNested Clusters Dendrogram
3641250
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 1 2 5 3 4 18Strength of MAX
Original Points
Two Clusters
ͻ Less susceptible to noise and outliers
19Limitations of MAX
Original Points
ͻTends to break large clusters
20 21MIN (2 clusters) MAX (2 clusters)
Limitations of MAX
ͻBiased towards globular clusters
Group Average or Average Link
Inter-cluster distance
The distance between two clusters is represented by the average distance of all pairs of data objects belonging to different clusters Determined by all pairs of points in the two clusters ,minqpdavgCCdGroup Average
Nested Clusters Dendrogram
3641250
0.05 0.1 0.15 0.2 0.25 1 2 3 4 5 6 1 2 5 3 4 23Group Average
Compromise between Single and Complete
LinkStrengths
Less susceptible to noise and outliers
Limitations
Biased towards globular clusters
24Centroid Distance
Inter-cluster distance
The distance between two clusters is represented by the distance between the centers of the clustersDetermined by cluster centroids
),(),(jijimeanmmdCCd 25Similarity of two clusters is based on the increase in squared error when two clusters are merged Similar to group average if distance between points is distance squared
Less susceptible to noise and outliers
Biased towards globular clusters
Hierarchical analogue of K-means
Can be used to initialize K-means
26Comparison
Group Average
Ward's Method
1 2 3 4 5 6 1 2 5 3 4MIN MAX
1 2 3 4 5 6 1 2 5 3 4 1 2 3 4 5 6 1 2 5 3 4 1 2 3 4 5 6 1 2 3 4 5 27Time and Space Requirements
O(N2) space since it uses the distance matrix
N is the number of points
O(N3) time in many cases
There are N steps and at each step the size, N2,
distance matrix must be updated and searchedComplexity can be reduced to O(N2 log(N) ) time
for some approaches 28Strengths
Do not have to assume any particular number
of clustersAny desired number of clusters can be obtained
They may correspond to meaningful
taxonomies camera, ..), furniture, groceries 29Problems and Limitations
Once a decision is made to combine two clusters, it cannot be undoneNo objective function is directly minimized
Different schemes have problems with one or more of the following:Sensitivity to noise and outliers
Difficulty handling different sized clusters and irregular shapesBreaking large clusters
30Take-away Message
Agglomerative and divisive hierarchical clusteringSeveral ways of defining inter-cluster distance
The properties of clusters outputted by different
approaches based on different inter-cluster distance definitionPros and cons of hierarchical clustering
31quotesdbs_dbs17.pdfusesText_23