K-means seen as non-probabilistic limit of EM applied to mixture Srihari 5 Two Updating Stages • First choose initial values for µ k • First phase: – minimize J
Previous PDF | Next PDF |
[PDF] CAH et K-Means sous Python - Université Lyon 2
protéines, lipides, etc ; 9 variables) L'objectif est d'identifier des groupes de fromages homogènes, partageant des caractéristiques similaires Nous utiliserons
[PDF] Méthode des K-means - Lyon 2
Algorithme K-Means – Méthode des centres mobiles 3 Cas des variables actives qualitatives 4 Fuzzy C-Means 5 Typologie, apprentissage non- supervisé, clustering Variables « actives », servent à la constitution des groupes Souvent (mais pas profils Cf le cours d'Analyse des Correspondances Multiples Chien
[PDF] 21 K-means clustering
have, in the case of a binary classification problem, the values ≠1 and 1 In cluster is smaller than the distance between two points that are from differ- ent clusters This will tering algorithms, among which are the K-means clustering algorithm, learn scikit-learn is a Python module for machine learning built on top of
[PDF] Practical implementation of k-means clustering - AWS Simple
DataCamp Customer Segmentation in Python Summary statistics of each cluster Run k-means segmentation for several k values around the recommended
[PDF] K-Means Clustering - CEDAR
K-means seen as non-probabilistic limit of EM applied to mixture Srihari 5 Two Updating Stages • First choose initial values for µ k • First phase: – minimize J
[PDF] k means gradient descent
[PDF] k means sklearn
[PDF] k parmi n
[PDF] k touré
[PDF] kahoot troubleshooting
[PDF] kamus larousse
[PDF] kanji 300 pdf
[PDF] kanji practice sheets pdf
[PDF] kansas city federal court
[PDF] kaplan schweser cfa question of the day
[PDF] karush kuhn tucker conditions example
[PDF] kawasaki dakar rally bike
[PDF] kegel exercise pdf download
[PDF] keller kiliani test is used for identification of
Machine Learning Srihari
1K-Means Clustering
Sargur Srihari
srihari@cedar.buffalo.eduMachine Learning Srihari
2Topics in Mixture Models and EM
• Mixture models • K-means Clustering • Mixtures of Gaussians - Maximum Likelihood - EM for Gaussian mistures • EM Algorithm• Gaussian mixture models motivates EM • Latent variable viewpoint • K-means seen as non-probabilistic limit of EM applied to mixture
of Gaussians • EM in generality - Bernoulli Mixture Models - Infinite Mixture ModelsMachine Learning Srihari
3K-means Clustering
• Given data set {x i }, i=1,..N in D-dimensional Euclidean space • Partition into K clusters (which is given) • One of K coding • Indicator variable r nk ∈ {0,1} where k =1,..,K - Describes which of K clusters data point x n is assigned toMachine Learning Srihari
4Objective function optimization
• Objective function is - Sum of squares of distances of each data point to its assigned vector µ k • Goal is to find values for {r nk } and the {µ k } so as to minimize J - Can be done by iterative procedure • Because objective is a linear function of r nk optimization can be performed easily to give a closed-form solution - Each iteration has two steps • Successive optimization w.r.t. r nk and µ k J=r nk k=1 K n=1 N ||x n k 2Machine Learning Srihari
5Two Updating Stages
• First choose initial values for µ k • First phase: - minimize J w.r.t. r nk keeping µ k fixed • Second phase: - minimize J w.r.t. µ k keeping r nk fixed • Two stages correspond to E (expectation) andM (maximization) of EM algorithm
- Expectation: what is the expected class? - Maximization: what is the mle of the mean?Machine Learning Srihari
6E: Determination of Indicator r
nk • Because is a linear function of r nk this optimization is performed easily (closed form solution) • Terms involving different n are independent - Therefore can optimize for each n separately - Choosing r nk to be 1 for whichever value of k gives minimum value of ||x n k 2 • Thus • Interpretation: - Assign x n to cluster whose mean is closest J=r nk k=1 K n=1 N ||x n k 2 r nk1 if k=argmin
j ||x n j 20 otherwise
Machine Learning Srihari
7M: Optimization of µ
k • Hold r nk fixed • Objective function is a quadratic function of µ k • Minimized by setting derivative w.r.t. µ k to zero • Thus • Which is solved to give • Interpretation: - Set µ k to mean of all data points x n assigned to cluster k k r nk x n n r nk nEqual to no of points assigned to cluster k
2r nk (x n k n=1 N )=0 J=r nk k=1 K n=1 N ||x n k 2Machine Learning Srihari
8Termination of K-Means
• Two phases - re-assigning data points to clusters - Re-computing means of clusters • Done repeatedly until no further change in assignments • Since each phase reduces J convergence is assured • May converge to local minimum of JMachine Learning Srihari
9Illustration of K-means
• Old Faithful dataset • Single Gaussian is a poor fit • We choose K = 2 • Data set is standardized so each variable has zero mean and unit standard deviation • Assignment of each data point to nearest cluster center is equivalent to - which side of the perpendicular bisector of line joining cluster centersMachine Learning Srihari
10K-means iterations
E step M step E step E step E step M step M step M stepInitial Choice of Means (Parameters)
Final Clusters And Means E step: parameters are fixed Distributions are optimized M step: distributions are fixed Parameters are optimized
Machine Learning Srihari
11Cost Function after Iteration
• J for Old Faithful Data • Poor initial value chosen for cluster centers - Several steps needed for convergence - Better choice is to assign µ k to random subset of k data points • K-means is itself used to initialize parameters for Gaussian mixture model before applying EMMachine Learning Srihari
12Faster Implementation of K-means
• Direct implementation can be slow - In E step Euclidean distances are computed between every mean and every data point • ||x n k 2 is computed for n=1,..N and k=1,..K • Faster implementations exist - Precomputing trees where nearby points are on same sub-tree - Use of triangle inequality to avoid unnecessary distance calculationMachine Learning Srihari
13On-line Stochastic Version
• Instead of batch processing entire data set • Apply Robbins-Monro procedure - To finding roots of the regression function given by the derivative of J w.r.t µ k - where η n is a learning rate parameter made to decrease monotonically as more samples are observed k new k old n x n k oldMachine Learning Srihari
14