[PDF] unsw how to write an annotated bibliography
[PDF] unts montevideo convention
[PDF] unvalidated data in an http response header
[PDF] unwto
[PDF] uob amazon promotion
[PDF] uoh academic calendar
[PDF] uom syllabus
[PDF] uon cover page
[PDF] uon cover sheet word doc
[PDF] uottawa brightspace help
[PDF] uottawa brightspace virtual campus
[PDF] uottawa dashboard
[PDF] uottawa.brightspace.c
[PDF] up and away nova
[PDF] up diliman transfer 2019 2020
Unsupervised Learning Algorithms
M. Emre Celebi • Kemal Aydin
Editors
Unsupervised Learning
Algorithms
123
Editors
M. Emre Celebi
Department of Computer Science
Louisiana State University in Shreveport
Shreveport, LA, USAKemal Aydin
North American University
Houston, TX, USA
ISBN 978-3-319-24209-5 ISBN 978-3-319-24211-8 (eBook)
DOI 10.1007/978-3-319-24211-8
Library of Congress Control Number: 2015060229
Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
Theuseof general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.
Printed on acid-free paper
Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www. springer.com)
Preface
With the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms-whichcan automatically discoverinteresting and usefulpatternsin such data-have gained popularity among researchers and practitioners. These algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. The difficulty of developingtheoreticallysoundapproachesthat are amenableto objectiveevaluation has resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The goal of this volume is to summarize the state of the art in unsupervised learning. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. This volume opens with two chapters on anomaly detection. In Anomaly Detection for Data with Spatial Attributes," P. Deepak reviews anomaly detection techniques for spatial data developed in the data mining and statistics communities. The author presents a taxonomy of such techniques, describes the most represen- tative ones, and discusses the applications of clustering and image segmentation to anomaly detection. In Anomaly Ranking in a High Dimensional Space: The Unsupervised TreeR- ank Algorithm," Clémençon et al. describe a computationally efficient anomaly ranking algorithm based on the minimization of the mass-volume criterion. This algorithm does not involve any sampling; therefore, it is especially suited for large and high-dimensionaldata. Clustering is undoubtedly the most well-known subfield of unsupervised learn- ing. The volume continues with 12 chapters on clustering. In Genetic Algorithms for Subset Selection in Model-Based Clustering," Scrucca describes a genetic algorithmthatmaximizestheBayesianinformationcriterion(BIC) toselecta subset of relevant features for model-based clustering. In particular, the criterion is based on the BIC difference between a candidate clustering model for the given subset and a model, which assumes no clustering for the same subset. The implementation v viPreface of this algorithm uses the facilities available in theGApackage for the open-source statistical computing environment R.
Radovanovi
´c investigate the performance of popular cluster quality assessment indices on synthetically generated high-dimensional Gaussian data. Extensive experiments reveal that dimensionality and degree of cluster overlap can affect both the mean quality score assigned by an index and the stability of its quality estimation. The authors also discover that appropriate treatment of hub points may improve the quality assessment process. In CombinatorialOptimization Approachesfor Data Clustering," Festa presents an overview of clustering algorithms with particular emphasis on algorithms based on combinatorial optimization. The author first reviews various mathematical programming formulations of the partitional clustering problem and some exact methods based on them. She then provides a brief survey of partitional clustering algorithms based on heuristics or metaheuristics. In KernelSpectralClustering andApplications,"Langoneet al. presenta survey of the recently proposed kernel spectral clustering (KSC) algorithm. The authors describe the basic KSC algorithm as well as its probabilistic, hierarchical, and sparse extensions. They also provide an overview of the various applications of the algorithm such as text clustering, image segmentation, power load clustering, and community detection in big data networks. InUni-andMulti-DimensionalClusteringviaBayesianNetworks,"Keivaniand Peña discuss model-based clustering using Bayesian networks. For unidimensional clustering, the authors propose the use of the Bayesian structural clustering (BSC) algorithm, which is based on the celebrated expectation-maximization algorithm. For the multidimensional case, the authors propose two algorithms, one based on a generalization of the BSC algorithm and the other based on multidimensional Bayesian network classification. The former algorithm turns out to be computation- ally demanding.So, the authorsprovidea preliminaryevaluationof the latter on two representative data sets. In A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks," Niros and Tsekouras propose a novel approach for designing radial basis function networks (RBFNs) based on hierarchical fuzzy clustering and particle swarm optimization with discriminant analysis. The authors compare the resulting RBFN classifier against various other classifiers on popular data sets from the UCI machine learning repository. In A Survey of Constrained Clustering," Dinler and Tural provide an in-depth overview of the field of constrained clustering (a.k.a. semi-supervised clustering). After giving an introduction to the field of clustering, the authors first review unsupervisedclustering.Theythenpresentasurveyofconstrainedclustering,where the prior knowledge comes from either labeled data or constraints. Finally, they discuss computational complexity issues and related work in the field. In An Overview of the Use of Clustering for Data Privacy," Torra et al. give a brief overview of data privacy with emphasis on the applications of clustering to
Prefacevii
data-drivenmethods.More specifically,they reviewthe use of clusteringin masking methods and information loss measures. In Nonlinear Clustering: Methods and Applications," Wang and Lai review clustering algorithms for nonlinearly separable data. The authors focus on four approaches: kernel-based clustering, multi-exemplar-based clustering, graph-based clustering, and support vectorclustering. In addition to discussing representative algorithms based on each of these approaches, the authors present applications of these algorithms to computer vision tasks such as image/video segmentation and image categorization. In Swarm Intelligence-Based Clustering Algorithms: A Survey,"Inkaya et al. present a detailed survey of algorithms for hard clustering based on swarm intelli- gence (SI). They categorize SI based clustering algorithms into five groups: particle swarm optimization based algorithms, ant colony optimization based algorithms, ant based sorting algorithms, hybrid algorithms, and miscellaneous algorithms. In addition,they presenta noveltaxonomyfor SI based clustering algorithmsbased on agent representation. In Extending Kmeans-Type Algorithms by Integrating Intra-Cluster Compact- nessand Inter-ClusterSeparation,"Huanget al. proposea frameworkforintegrating both intra-cluster compactness and inter-cluster separation criteria in k-means-type clustering algorithms. Based on their proposed framework, the authors design three novel,computationallyefficientk-means-typealgorithms.Theperformanceofthese algorithms is demonstrated on a variety of data sets, using several cluster quality assessment indices. In A Fuzzy-Soft Competitive Learning Approach for Grayscale Image Com- pression," Tsolakis and Tsekouras propose a novel, two-stage vector quantization algorithm that combines the merits of hard and soft vector quantization paradigms. The first stage involves a soft competitive learning scheme with a fuzzy neighbor- hood function, which can measure the lateral neuron interaction phenomenon and the degree of neuron excitations. The second stage improves the partition generated in the first stage by means of a codeword migration strategy. Experimental results on classic grayscale images demonstrate the effectiveness and efficiency of the proposed algorithm in comparison with several state-of-the-art algorithms. This volume continues with two chapters on the applications of unsupervised learning. In Unsupervised Learning in Genome Informatics," Wong et al. review a selection of state-of-the-art unsupervised learning algorithms for genome informat- ics. The chapter is divided into two parts. In the first part, the authorsreview various algorithms for protein-DNA binding event discovery and search from sequence patterns to genome-wide levels. In the second part, several algorithms for inferring microRNA regulatory networks are presented. In The Application of LSA to the Evaluation of Questionnaire Responses," Martin et al. investigate the applicability of Latent Semantic Analysis (LSA) to the automated evaluation of responses to essay questions. The authors first describe the nature of essay questions. They then give a detailed overview of LSA including its historicaland mathematicalbackground,its use as an unsupervisedlearningsystem, viiiPreface and its applications. The authors conclude with a discussion of the application of LSA to automated essay evaluation and a case study involving a driver training system. This volume concludes with two chapters on miscellaneous topics regarding unsupervised learning. In Mining Evolving Patterns in Dynamic Relational Net- works," Ahmed and Karypis present various practical algorithms for unsupervised analysis of the temporal evolution of patterns in dynamic relational networks. The authors introduce various classes of dynamic patterns, which enable the identification of hidden coordination mechanisms underlying the networks, provide information on the recurrence and stability of its relational patterns, and improve the ability to predict the relations and their changes in these networks. Finally, in Probabilistically Grounded Unsupervised Training of Neural Net- works," Trentin and Bongini present a survey of probabilistic interpretations of artificial neural networks (ANNs). The authors first review the use of ANNs for estimating probability density functions. They then describe a competitive neuralnetworkalgorithmforunsupervisedclusteringbasedon maximumlikelihood estimation. They conclude with a discussion of probabilistic modeling of sequences of random observations using a hybrid ANN/hidden Markov model. We hope that this volume, focused on unsupervised learning algorithms, will demonstrate the significant progress that has occurred in this field in recent years. We also hope that the developments reported in this volume will motivate further research in this exciting field.
Shreveport, LA, USA M. Emre Celebi
Houston, TX, USA Kemal Aydin
Contents
Anomaly Detection for Data with Spatial Attributes......................... 1
P. Deepak
Anomaly Ranking in a High Dimensional Space:
The U NSUPERVISEDTREERANKAlgorithm................................. 33
S. Clémençon, N. Baskiotis, and N. Vayatis
Genetic Algorithms for Subset Selection in Model-Based Clustering...... 55
Luca Scrucca
Clustering Evaluation in High-Dimensional Data............................ 71 ´c Combinatorial Optimization Approaches for Data Clustering............. 109
Paola Festa
Kernel Spectral Clustering and Applications................................. 135
Rocco Langone, Raghvendra Mall, Carlos Alzate,
and Johan A. K. Suykens Uni- and Multi-Dimensional Clustering Via Bayesian Networks........... 163
Omid Keivani and Jose M. Peña
A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks................................................. 193
Antonios D. Niros and George E. Tsekouras
A Survey of Constrained Clustering........................................... 207
Derya Dinler and Mustafa Kemal Tural
An Overview of the Use of Clustering for Data Privacy..................... 237 Vicenç Torra, Guillermo Navarro-Arribas, and Klara Stokes Nonlinear Clustering: Methods and Applications............................ 253
Chang-Dong Wang and Jian-Huang Lai
ix xContents Swarm Intelligence-Based Clustering Algorithms: A Survey............... 303 TülinInkaya, Sinan Kayalıgil, and Nur Evin Özdemirel
Extending Kmeans-Type Algorithms by Integrating
Intra-cluster Compactness and Inter-cluster Separation.................... 343
Xiaohui Huang, Yunming Ye, and Haijun Zhang
A Fuzzy-Soft Competitive Learning Approach for Grayscale Image Compression.............................................................. 385
Dimitrios M. Tsolakis and George E. Tsekouras
Unsupervised Learning in Genome Informatics.............................. 405
Ka-Chun Wong, Yue Li, and Zhaolei Zhang
The Application of LSA to the Evaluation of Questionnaire Responses... 449 Dian I. Martin, John C. Martin, and Michael W. Berry Mining Evolving Patterns in Dynamic Relational Networks................ 485
Rezwan Ahmed and George Karypis
Probabilistically Grounded Unsupervised Training of Neural Networks.. 533
Edmondo Trentin and Marco Bongini
quotesdbs_dbs24.pdfusesText_30