[PDF] [PDF] Data Mining Cluster Analysis: Basic Concepts and - DidaWiki

Cluster Analysis: Basic Concepts and Algorithms Fannie-Mae-DOWN,Fed- Home-Loan-DOWN, Source: http://cs jhu edu/~razvanm/fs-expedition/tux3 html  



Previous PDF Next PDF





[PDF] Cluster Analysis - Computer Science & Engineering User Home Pages

We then describe three specific clustering techniques that represent Page 4 490 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms broad categories of  



[PDF] Data Mining Cluster Analysis - Computer Science & Engineering

Data Mining Cluster Analysis: Basic Concepts and Algorithms Fannie-Mae- DOWN,Fed-Home-Loan-DOWN, Hierarchical clustering algorithms typically have local objectives Traditional hierarchical algorithms use a similarity or distance 



[PDF] CS6220: Data Mining Techniques

5 oct 2014 · Cluster Analysis: Basic Concepts clustering • Land use: Identification of areas of similar land use in an earth Partitioning Algorithms: Basic Concept http:// webdocs cs ualberta ca/~yaling/Cluster/Applet/Code/Cluster html



[PDF] Data Mining Cluster Analysis: Basic Concepts and - DidaWiki

Cluster Analysis: Basic Concepts and Algorithms Fannie-Mae-DOWN,Fed- Home-Loan-DOWN, Source: http://cs jhu edu/~razvanm/fs-expedition/tux3 html  



[PDF] (I) Cluster Analysis - Mining Latent Entity Structures

CS 412 Intro to Data Mining Chapter 10 3 Chapter 10 Cluster Analysis: Basic Concepts and Methods User-given preferences or constraints; domain knowledge; user queries Given K, the number of clusters, the K-Means clustering algorithm is outlined as follows From wikipedia and http://home dei polimi it 



[PDF] Introduction to Data Mining

8 Cluster Analysis: Basic Concepts and Algorithms 125 9 Cluster Analysis: them to the user in a more concise form, e g , by reporting the 10 most frequent 



[PDF] Clustering techniques and unsupervised learning - Berkeley bCourses

Cluster Analysis: Basic Concepts and Algorithms, Chapter 8, Tan, Steinbach, Kumar, University of http://www-users cs umn edu/~kumar/dmbook/ch8 pdf



[PDF] Some Key Concepts in Data Mining – Clustering - DIMACS

and Theoretical Computer Science Volume tain large numbers of variables of different types: geographic (home address, work Data users need to be aware of all these effects before We begin our discussion of clustering algorithms with a simple to describe the significance and meaning of the results of clustering



[PDF] Cluster Analysis - UCL

Aggarwal, C C and Reddy, C K (2014), Data Clustering: Algorithms and Applications, Further (somewhat outdated) books on cluster analysis are for example Gordon basic tasks for the development of human language and conceptual thinking This assumes that the dataset in in the directory in which R is run;



A Data-Clustering Algorithm On Distributed Memory Multiprocessors

WWW home page: http://www cs utexas edu/users/inderjit 2 IBM Almaden Our interest in clustering stems from the need to mine and analyze heaps of unstructured concepts” in sets of unstructured text documents, and to summarize and label In this paper, as our main contribution, we propose a parallel clustering al-

[PDF] Manipulation des donnees avec Pandas

[PDF] Base R cheat sheet - RStudio

[PDF] Spark SQL: Relational Data Processing in Spark - UC Berkeley

[PDF] Cours 4 data frames

[PDF] Data Mart Consolidation - IBM Redbooks

[PDF] Data mining 1 Exploration Statistique - Institut de Recherche

[PDF] Cours de Data Mining

[PDF] Cours IFT6266, Exemple d'application: Data-Mining

[PDF] Introduction au Data Mining - Cedric/CNAM

[PDF] Defining a Data Model - CA Support

[PDF] Learning Data Modelling by Example - Database Answers

[PDF] Nouveaux prix à partir du 1er août 2017 Mobilus Mobilus - Proximus

[PDF] règlement général de la consultation - Inventons la Métropole du

[PDF] Data science : fondamentaux et études de cas

[PDF] Bases du data scientist - Data science Master 2 ISIDIS - LISIC

Data Mining

Cluster Analysis: Basic Concepts

and Algorithms

Lecture Notes for Chapter 8

Introduction to Data Mining

by

Tan, Steinbach, Kumar

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

What is Cluster Analysis?

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

Inter-cluster

distances are maximized

Intra-cluster

distances are minimized © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

Applications of Cluster Analysis

Understanding

Group related documents

for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations

Summarization

Reduce the size of large

data sets

Discovered ClustersIndustry Group

1

Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,

DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,

Sun-DOWN

Technology1-DOWN

2

Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,

ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,

Computer-Assoc-DOWN,Circuit-City-DOWN,

Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN,

Technology2-DOWN

3

Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,

MBNA-Corp-DOWN,Morgan-Stanley-DOWN

Financial-DOWN

4

Schlumberger-UP

Oil-UP

Clustering precipitation

in Australia © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

What is not Cluster Analysis?

Supervised classification

Have class label information

Simple segmentation

Dividing students into different registration groups alphabetically, by last name

Results of a query

Groupings are a result of an external specification

Graph partitioning

Some mutual relevance and synergy, but areas are not identical © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6

Types of Clusterings

A clustering is a set of clusters

Important distinction between hierarchical and

partitional sets of clusters

Partitional Clustering

A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset

Hierarchical clustering

A set of nested clusters organized as a hierarchical tree © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

Partitional Clustering

Original PointsA Partitional Clustering

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

Hierarchical Clustering

p4 p1 p3 p2 p4p1p2p3

Hierarchical ClusteringDendrogram

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

Hierarchical Clustering

Hierarchical ClusteringDendrogram

Source: http://cs.jhu.edu/~razvanm/fs-expedition/tux3.html © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12

Types of Clusters

Well-separated clusters

Center-based clusters

Contiguous clusters

Density-based clusters

Property or Conceptual

Described by an Objective Function

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13

Types of Clusters: Well-Separated

Well-Separated Clusters:

A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster.

3 well-separated clusters

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14

Types of Clusters: Center-Based

Center-based

A cluster is a set of objects such that an object in a cluster is closer (more similar) to the "center" of a cluster, than to the center of any other cluster The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most "representative" point of a cluster

4 center-based clusters

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

Types of Clusters: Contiguity-Based

Contiguous Cluster (Nearest neighbor or

Transitive)

A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster.

8 contiguous clusters

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16

Types of Clusters: Density-Based

Density-based

A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density. Used when the clusters are irregular or intertwined, and when noise and outliers are present.

6 density-based clusters

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20

Characteristics of the Input Data Are Important

Type of proximity or density measure

-This is a derived measure, but central to clustering

Sparseness

Dictates type of similarity

Adds to efficiency

Attribute type

-Dictates type of similarity

Type of Data

Dictates type of similarity

-Other characteristics, e.g., autocorrelation

Dimensionality

Noise and Outliers

Type of Distribution

quotesdbs_dbs20.pdfusesText_26