[PDF] Unsupervised Learning Algorithms

With the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms–which can automatically discover interesting and useful patterns 



Previous PDF Next PDF





[PDF] Supervised and Unsupervised Learning

Neural Networks – Mul5 Layer Perceptron – Decision Trees • Unsupervised Models – Different Types of Clustering – Distances and Normaliza5on – Kmeans



[PDF] Unsupervised Learning

Clustering algorithms in high dimensions might fail to identify clusters that we “ see” when 2 Page 3 we visualize the data by projecting it onto 2 or 3- dimensional 



[PDF] Unsupervised Learning: Clustering - MIT

Given k, the k-means algorithm works as follows: 1 Choose k (random) data points (seeds) to be the initial centroids, cluster centers 2 Assign each data point to 



[PDF] 5 Unsupervised Learning and Clustering Algorithms

5 Unsupervised Learning and Clustering Algorithms 5 1 Competitive learning The perceptron learning algorithm is an example of supervised learning This



[PDF] Comparison of Supervised and Unsupervised Learning Algorithms

Abstract: This paper presents a comparative account of unsupervised and supervised learning models and their pattern classification evaluations as applied to 



[PDF] Unsupervised Learning of Models for Recognition

The variability within a class is represented by a joint probability density function ( pdf ) on the shape of the constellation and the output of part detectors In a first 



Unsupervised Learning Algorithms

With the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms–which can automatically discover interesting and useful patterns 



[PDF] Unsupervised learning in neural computation - CORE

www elsevier com/locate/tcs Unsupervised learning in neural computation Erkki Oja Helsinki University of Technology, Neural Networks Research Centre, P O  

[PDF] unsw how to write an annotated bibliography

[PDF] unts montevideo convention

[PDF] unvalidated data in an http response header

[PDF] unwto

[PDF] uob amazon promotion

[PDF] uoh academic calendar

[PDF] uom syllabus

[PDF] uon cover page

[PDF] uon cover sheet word doc

[PDF] uottawa brightspace help

[PDF] uottawa brightspace virtual campus

[PDF] uottawa dashboard

[PDF] uottawa.brightspace.c

[PDF] up and away nova

[PDF] up diliman transfer 2019 2020

Unsupervised Learning Algorithms

M. Emre Celebi • Kemal Aydin

Editors

Unsupervised Learning

Algorithms

123

Editors

M. Emre Celebi

Department of Computer Science

Louisiana State University in Shreveport

Shreveport, LA, USAKemal Aydin

North American University

Houston, TX, USA

ISBN 978-3-319-24209-5 ISBN 978-3-319-24211-8 (eBook)

DOI 10.1007/978-3-319-24211-8

Library of Congress Control Number: 2015060229

Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of

the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology

now known or hereafter developed.

Theuseof general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book

are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or

the editors give a warranty, express or implied, with respect to the material contained herein or for any

errors or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www. springer.com)

Preface

With the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms-whichcan automatically discoverinteresting and usefulpatternsin such data-have gained popularity among researchers and practitioners. These algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. The difficulty of developingtheoreticallysoundapproachesthat are amenableto objectiveevaluation has resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The goal of this volume is to summarize the state of the art in unsupervised learning. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. This volume opens with two chapters on anomaly detection. In “Anomaly Detection for Data with Spatial Attributes," P. Deepak reviews anomaly detection techniques for spatial data developed in the data mining and statistics communities. The author presents a taxonomy of such techniques, describes the most represen- tative ones, and discusses the applications of clustering and image segmentation to anomaly detection. In “Anomaly Ranking in a High Dimensional Space: The Unsupervised TreeR- ank Algorithm," Clémençon et al. describe a computationally efficient anomaly ranking algorithm based on the minimization of the mass-volume criterion. This algorithm does not involve any sampling; therefore, it is especially suited for large and high-dimensionaldata. Clustering is undoubtedly the most well-known subfield of unsupervised learn- ing. The volume continues with 12 chapters on clustering. In “Genetic Algorithms for Subset Selection in Model-Based Clustering," Scrucca describes a genetic algorithmthatmaximizestheBayesianinformationcriterion(BIC) toselecta subset of relevant features for model-based clustering. In particular, the criterion is based on the BIC difference between a candidate clustering model for the given subset and a model, which assumes no clustering for the same subset. The implementation v viPreface of this algorithm uses the facilities available in theGApackage for the open-source statistical computing environment R.

Radovanovi

´c investigate the performance of popular cluster quality assessment indices on synthetically generated high-dimensional Gaussian data. Extensive experiments reveal that dimensionality and degree of cluster overlap can affect both the mean quality score assigned by an index and the stability of its quality estimation. The authors also discover that appropriate treatment of hub points may improve the quality assessment process. In “CombinatorialOptimization Approachesfor Data Clustering," Festa presents an overview of clustering algorithms with particular emphasis on algorithms based on combinatorial optimization. The author first reviews various mathematical programming formulations of the partitional clustering problem and some exact methods based on them. She then provides a brief survey of partitional clustering algorithms based on heuristics or metaheuristics. In “KernelSpectralClustering andApplications,"Langoneet al. presenta survey of the recently proposed kernel spectral clustering (KSC) algorithm. The authors describe the basic KSC algorithm as well as its probabilistic, hierarchical, and sparse extensions. They also provide an overview of the various applications of the algorithm such as text clustering, image segmentation, power load clustering, and community detection in big data networks. In“Uni-andMulti-DimensionalClusteringviaBayesianNetworks,"Keivaniand Peña discuss model-based clustering using Bayesian networks. For unidimensional clustering, the authors propose the use of the Bayesian structural clustering (BSC) algorithm, which is based on the celebrated expectation-maximization algorithm. For the multidimensional case, the authors propose two algorithms, one based on a generalization of the BSC algorithm and the other based on multidimensional Bayesian network classification. The former algorithm turns out to be computation- ally demanding.So, the authorsprovidea preliminaryevaluationof the latter on two representative data sets. In “A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks," Niros and Tsekouras propose a novel approach for designing radial basis function networks (RBFNs) based on hierarchical fuzzy clustering and particle swarm optimization with discriminant analysis. The authors compare the resulting RBFN classifier against various other classifiers on popular data sets from the UCI machine learning repository. In “A Survey of Constrained Clustering," Dinler and Tural provide an in-depth overview of the field of constrained clustering (a.k.a. semi-supervised clustering). After giving an introduction to the field of clustering, the authors first review unsupervisedclustering.Theythenpresentasurveyofconstrainedclustering,where the prior knowledge comes from either labeled data or constraints. Finally, they discuss computational complexity issues and related work in the field. In “An Overview of the Use of Clustering for Data Privacy," Torra et al. give a brief overview of data privacy with emphasis on the applications of clustering to

Prefacevii

data-drivenmethods.More specifically,they reviewthe use of clusteringin masking methods and information loss measures. In “Nonlinear Clustering: Methods and Applications," Wang and Lai review clustering algorithms for nonlinearly separable data. The authors focus on four approaches: kernel-based clustering, multi-exemplar-based clustering, graph-based clustering, and support vectorclustering. In addition to discussing representative algorithms based on each of these approaches, the authors present applications of these algorithms to computer vision tasks such as image/video segmentation and image categorization. In “Swarm Intelligence-Based Clustering Algorithms: A Survey,"Inkaya et al. present a detailed survey of algorithms for hard clustering based on swarm intelli- gence (SI). They categorize SI based clustering algorithms into five groups: particle swarm optimization based algorithms, ant colony optimization based algorithms, ant based sorting algorithms, hybrid algorithms, and miscellaneous algorithms. In addition,they presenta noveltaxonomyfor SI based clustering algorithmsbased on agent representation. In “Extending Kmeans-Type Algorithms by Integrating Intra-Cluster Compact- nessand Inter-ClusterSeparation,"Huanget al. proposea frameworkforintegrating both intra-cluster compactness and inter-cluster separation criteria in k-means-type clustering algorithms. Based on their proposed framework, the authors design three novel,computationallyefficientk-means-typealgorithms.Theperformanceofthese algorithms is demonstrated on a variety of data sets, using several cluster quality assessment indices. In “A Fuzzy-Soft Competitive Learning Approach for Grayscale Image Com- pression," Tsolakis and Tsekouras propose a novel, two-stage vector quantization algorithm that combines the merits of hard and soft vector quantization paradigms. The first stage involves a soft competitive learning scheme with a fuzzy neighbor- hood function, which can measure the lateral neuron interaction phenomenon and the degree of neuron excitations. The second stage improves the partition generated in the first stage by means of a codeword migration strategy. Experimental results on classic grayscale images demonstrate the effectiveness and efficiency of the proposed algorithm in comparison with several state-of-the-art algorithms. This volume continues with two chapters on the applications of unsupervised learning. In “Unsupervised Learning in Genome Informatics," Wong et al. review a selection of state-of-the-art unsupervised learning algorithms for genome informat- ics. The chapter is divided into two parts. In the first part, the authorsreview various algorithms for protein-DNA binding event discovery and search from sequence patterns to genome-wide levels. In the second part, several algorithms for inferring microRNA regulatory networks are presented. In “The Application of LSA to the Evaluation of Questionnaire Responses," Martin et al. investigate the applicability of Latent Semantic Analysis (LSA) to the automated evaluation of responses to essay questions. The authors first describe the nature of essay questions. They then give a detailed overview of LSA including its historicaland mathematicalbackground,its use as an unsupervisedlearningsystem, viiiPreface and its applications. The authors conclude with a discussion of the application of LSA to automated essay evaluation and a case study involving a driver training system. This volume concludes with two chapters on miscellaneous topics regarding unsupervised learning. In “Mining Evolving Patterns in Dynamic Relational Net- works," Ahmed and Karypis present various practical algorithms for unsupervised analysis of the temporal evolution of patterns in dynamic relational networks. The authors introduce various classes of dynamic patterns, which enable the identification of hidden coordination mechanisms underlying the networks, provide information on the recurrence and stability of its relational patterns, and improve the ability to predict the relations and their changes in these networks. Finally, in “Probabilistically Grounded Unsupervised Training of Neural Net- works," Trentin and Bongini present a survey of probabilistic interpretations of artificial neural networks (ANNs). The authors first review the use of ANNs for estimating probability density functions. They then describe a competitive neuralnetworkalgorithmforunsupervisedclusteringbasedon maximumlikelihood estimation. They conclude with a discussion of probabilistic modeling of sequences of random observations using a hybrid ANN/hidden Markov model. We hope that this volume, focused on unsupervised learning algorithms, will demonstrate the significant progress that has occurred in this field in recent years. We also hope that the developments reported in this volume will motivate further research in this exciting field.

Shreveport, LA, USA M. Emre Celebi

Houston, TX, USA Kemal Aydin

Contents

Anomaly Detection for Data with Spatial Attributes......................... 1

P. Deepak

Anomaly Ranking in a High Dimensional Space:

The U NSUPERVISEDTREERANKAlgorithm................................. 33

S. Clémençon, N. Baskiotis, and N. Vayatis

Genetic Algorithms for Subset Selection in Model-Based Clustering...... 55

Luca Scrucca

Clustering Evaluation in High-Dimensional Data............................ 71 ´c Combinatorial Optimization Approaches for Data Clustering............. 109

Paola Festa

Kernel Spectral Clustering and Applications................................. 135

Rocco Langone, Raghvendra Mall, Carlos Alzate,

and Johan A. K. Suykens Uni- and Multi-Dimensional Clustering Via Bayesian Networks........... 163

Omid Keivani and Jose M. Peña

A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks................................................. 193

Antonios D. Niros and George E. Tsekouras

A Survey of Constrained Clustering........................................... 207

Derya Dinler and Mustafa Kemal Tural

An Overview of the Use of Clustering for Data Privacy..................... 237 Vicenç Torra, Guillermo Navarro-Arribas, and Klara Stokes Nonlinear Clustering: Methods and Applications............................ 253

Chang-Dong Wang and Jian-Huang Lai

ix xContents Swarm Intelligence-Based Clustering Algorithms: A Survey............... 303 TülinInkaya, Sinan Kayalıgil, and Nur Evin Özdemirel

Extending Kmeans-Type Algorithms by Integrating

Intra-cluster Compactness and Inter-cluster Separation.................... 343

Xiaohui Huang, Yunming Ye, and Haijun Zhang

A Fuzzy-Soft Competitive Learning Approach for Grayscale Image Compression.............................................................. 385

Dimitrios M. Tsolakis and George E. Tsekouras

Unsupervised Learning in Genome Informatics.............................. 405

Ka-Chun Wong, Yue Li, and Zhaolei Zhang

The Application of LSA to the Evaluation of Questionnaire Responses... 449 Dian I. Martin, John C. Martin, and Michael W. Berry Mining Evolving Patterns in Dynamic Relational Networks................ 485

Rezwan Ahmed and George Karypis

Probabilistically Grounded Unsupervised Training of Neural Networks.. 533

Edmondo Trentin and Marco Bongini

quotesdbs_dbs24.pdfusesText_30