ERA

Download the full-sized PDF of Finding, Evaluating and Exploring Clustering Alternatives Unsupervised and Semi-supervisedDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3D679

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Finding, Evaluating and Exploring Clustering Alternatives Unsupervised and Semi-supervised Open Access

Descriptions

Other title
Subject/Keyword
Hierarchical Density-Based Clustering
Density-Based Clustering Validation
Density-Based Clustering
Semi-supervised Clustering
Semi-supervised Model Selection
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
Moulavi, Davoud
Supervisor and department
Sander, Jörg (Computing Science)
Examining committee member and department
Zaïane, Osmar R. (Computing Science)
Greiner , Russell (Computing Science)
Campello, Ricardo J.G.B. (Computer Sciences)
Spiliopoulou, Myra (Computer Science)
Department
Department of Computing Science
Specialization

Date accepted
2014-09-25T15:52:01Z
Graduation date
2014-11
Degree
Doctor of Philosophy
Degree level
Doctoral
Abstract
Clustering aims at grouping data objects into meaningful clusters using no (or only a small amount of) supervision. This thesis studies two major clustering paradigms: density-based and semi-supervised clustering. Density-based clustering algorithms seek partitions with high-density areas of points (clusters that are not necessarily globular) separated by low-density areas that may contain noise objects. Semi-supervised clustering algorithms use a small amount of information about data to guide the clustering task. In the context of density-based clustering, we study (a) the validation of density-based clustering and (b) hierarchical density-based clustering. The validation of density-based clustering, i.e., the objective and quantitative assessment of clustering results, is one of the most challenging aspects of clustering. Numerous different relative validity criteria have been proposed for the validation of globular clusters. Not all data, however, are composed of globular clusters. We propose a relative density-based validation index, DBCV, that assesses the quality of an arbitrarily-shaped clustering based on the relative density connection between pairs of objects. Our index is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within- and between-cluster density connectedness of clustering results. In addition to the DBCV, we make several major contributions in the area of hierarchical density-based clustering. We improve on the AUTO-HDS framework for automated clustering and visualization of biological data sets by removing a parameter thereby making the cluster extraction stage simpler and more accurate. We also propose a theoretically and practically improved general hierarchical density-based clustering, called GHDBSCAN, which generalizes the density-based clustering by recognizing its essential components and based on this generalization we propose two algorithms, GHDBSCAN(NMRD) and GHDBSCAN(NMRD+PF), which improve over previous state-of-the-art methods both theoretically and practically. Regarding semi-supervised clustering, we use the knowledge available about a dataset in the form of constraints to guide the clustering algorithm. In this context, we provide two approaches for model selection that allow the user to select the best model based on few constraints and/or the DBCV value and also discuss a framework for extracting a partitional clustering from a hierarchical clustering tree.
Language
English
DOI
doi:10.7939/R3D679
Rights
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication
D. Moulavi, P.A. Jaskowiak, R.J.G.B. Campello, A. Zimek, J. Sander. Density-Based Clustering Validation. Proc. of the 2014 SIAM International Conference on Data Mining (SDM), Philadelphia, PA, USA, 2014.M. Pourrajabi, D. Moulavi, R.J.G.B. Campello, A. Zimek, J. Sander, R. Goebel. Model Selection for Semi-Supervised Clustering. Proc. of the 17th Int. Conf. on Extending Database Technology (EDBT), Athens, Greece, 2014.R.J.G.B. Campello, D. Moulavi, J. Sander. A Simpler and More Accurate AUTO-HDS Framework for Clustering and Visualization of Biological Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Vol. 9, No. 6, 1850-1852, 2012.R.J.G.B. Campello, D. Moulavi, J. Sander. Density-Based Clustering Based on Hierarchical Density Estimates. Proc. Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), LNAI 7819, Gold Coast, Australia, 2013, 160-172.R.J.G.B. Campello, D. Moulavi, A. Zimek, J. Sander. A Framework for Semi-Supervised and Unsupervised Optimal Extraction of Clusters from Hierarchies, Data Mining and Knowledge Discovery (DMKD), Vol. 27, 344-371, 2013.

File Details

Date Uploaded
Date Modified
2014-11-15T08:19:53.250+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 1899388
Last modified: 2015:10:12 11:55:30-06:00
Filename: Moulavi_Davoud_201409_PhD.pdf
Original checksum: 7bf777e7ab4331a315606d9d886aff18
Well formed: true
Valid: true
Status message: Too many fonts to report; some fonts omitted. Total fonts = 1085
Page count: 167
Activity of users you follow
User Activity Date