[PDF] [PDF] K-Means Clustering - CEDAR

K-means seen as non-probabilistic limit of EM applied to mixture Srihari 5 Two Updating Stages • First choose initial values for µ k • First phase: – minimize J 



Previous PDF Next PDF





[PDF] CAH et K-Means sous Python - Université Lyon 2

protéines, lipides, etc ; 9 variables) L'objectif est d'identifier des groupes de fromages homogènes, partageant des caractéristiques similaires Nous utiliserons 



[PDF] Méthode des K-means - Lyon 2

Algorithme K-Means – Méthode des centres mobiles 3 Cas des variables actives qualitatives 4 Fuzzy C-Means 5 Typologie, apprentissage non- supervisé, clustering Variables « actives », servent à la constitution des groupes Souvent (mais pas profils Cf le cours d'Analyse des Correspondances Multiples Chien



[PDF] 21 K-means clustering

have, in the case of a binary classification problem, the values ≠1 and 1 In cluster is smaller than the distance between two points that are from differ- ent clusters This will tering algorithms, among which are the K-means clustering algorithm, learn scikit-learn is a Python module for machine learning built on top of



[PDF] Practical implementation of k-means clustering - AWS Simple

DataCamp Customer Segmentation in Python Summary statistics of each cluster Run k-means segmentation for several k values around the recommended 



[PDF] K-Means Clustering - CEDAR

K-means seen as non-probabilistic limit of EM applied to mixture Srihari 5 Two Updating Stages • First choose initial values for µ k • First phase: – minimize J 

[PDF] k means convergence proof

[PDF] k means gradient descent

[PDF] k means sklearn

[PDF] k parmi n

[PDF] k touré

[PDF] kahoot troubleshooting

[PDF] kamus larousse

[PDF] kanji 300 pdf

[PDF] kanji practice sheets pdf

[PDF] kansas city federal court

[PDF] kaplan schweser cfa question of the day

[PDF] karush kuhn tucker conditions example

[PDF] kawasaki dakar rally bike

[PDF] kegel exercise pdf download

[PDF] keller kiliani test is used for identification of

Machine Learning Srihari

1

K-Means Clustering

Sargur Srihari

srihari@cedar.buffalo.edu

Machine Learning Srihari

2

Topics in Mixture Models and EM

• Mixture models • K-means Clustering • Mixtures of Gaussians - Maximum Likelihood - EM for Gaussian mistures • EM Algorithm

• Gaussian mixture models motivates EM • Latent variable viewpoint • K-means seen as non-probabilistic limit of EM applied to mixture

of Gaussians • EM in generality - Bernoulli Mixture Models - Infinite Mixture Models

Machine Learning Srihari

3

K-means Clustering

• Given data set {x i }, i=1,..N in D-dimensional Euclidean space • Partition into K clusters (which is given) • One of K coding • Indicator variable r nk ∈ {0,1} where k =1,..,K - Describes which of K clusters data point x n is assigned to

Machine Learning Srihari

4

Objective function optimization

• Objective function is - Sum of squares of distances of each data point to its assigned vector µ k • Goal is to find values for {r nk } and the {µ k } so as to minimize J - Can be done by iterative procedure • Because objective is a linear function of r nk optimization can be performed easily to give a closed-form solution - Each iteration has two steps • Successive optimization w.r.t. r nk and µ k J=r nk k=1 K n=1 N ||x n k 2

Machine Learning Srihari

5

Two Updating Stages

• First choose initial values for µ k • First phase: - minimize J w.r.t. r nk keeping µ k fixed • Second phase: - minimize J w.r.t. µ k keeping r nk fixed • Two stages correspond to E (expectation) and

M (maximization) of EM algorithm

- Expectation: what is the expected class? - Maximization: what is the mle of the mean?

Machine Learning Srihari

6

E: Determination of Indicator r

nk • Because is a linear function of r nk this optimization is performed easily (closed form solution) • Terms involving different n are independent - Therefore can optimize for each n separately - Choosing r nk to be 1 for whichever value of k gives minimum value of ||x n k 2 • Thus • Interpretation: - Assign x n to cluster whose mean is closest J=r nk k=1 K n=1 N ||x n k 2 r nk

1 if k=argmin

j ||x n j 2

0 otherwise

Machine Learning Srihari

7

M: Optimization of µ

k • Hold r nk fixed • Objective function is a quadratic function of µ k • Minimized by setting derivative w.r.t. µ k to zero • Thus • Which is solved to give • Interpretation: - Set µ k to mean of all data points x n assigned to cluster k k r nk x n n r nk n

Equal to no of points assigned to cluster k

2r nk (x n k n=1 N )=0 J=r nk k=1 K n=1 N ||x n k 2

Machine Learning Srihari

8

Termination of K-Means

• Two phases - re-assigning data points to clusters - Re-computing means of clusters • Done repeatedly until no further change in assignments • Since each phase reduces J convergence is assured • May converge to local minimum of J

Machine Learning Srihari

9

Illustration of K-means

• Old Faithful dataset • Single Gaussian is a poor fit • We choose K = 2 • Data set is standardized so each variable has zero mean and unit standard deviation • Assignment of each data point to nearest cluster center is equivalent to - which side of the perpendicular bisector of line joining cluster centers

Machine Learning Srihari

10

K-means iterations

E step M step E step E step E step M step M step M step

Initial Choice of Means (Parameters)

Final Clusters And Means E step: parameters are fixed Distributions are optimized M step: distributions are fixed Parameters are optimized

Machine Learning Srihari

11

Cost Function after Iteration

• J for Old Faithful Data • Poor initial value chosen for cluster centers - Several steps needed for convergence - Better choice is to assign µ k to random subset of k data points • K-means is itself used to initialize parameters for Gaussian mixture model before applying EM

Machine Learning Srihari

12

Faster Implementation of K-means

• Direct implementation can be slow - In E step Euclidean distances are computed between every mean and every data point • ||x n k 2 is computed for n=1,..N and k=1,..K • Faster implementations exist - Precomputing trees where nearby points are on same sub-tree - Use of triangle inequality to avoid unnecessary distance calculation

Machine Learning Srihari

13

On-line Stochastic Version

• Instead of batch processing entire data set • Apply Robbins-Monro procedure - To finding roots of the regression function given by the derivative of J w.r.t µ k - where η n is a learning rate parameter made to decrease monotonically as more samples are observed k new k old n x n k old

Machine Learning Srihari

14

Dissimilarity Measure

• Euclidean distance has limitations - Inappropriate for categorical labels - Cluster means are non-robust to outliers • Use more general dissimilarity measure (x,xAquotesdbs_dbs17.pdfusesText_23