Finding, Evaluating and Exploring Clustering Alternatives Unsupervised and Semi-supervised

Moulavi, Davoud

doi:doi:10.7939/R3D679

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

317 views
504 downloads

Finding, Evaluating and Exploring Clustering Alternatives Unsupervised and Semi-supervised

Author / Creator

Moulavi, Davoud
Clustering aims at grouping data objects into meaningful clusters using no (or only a small amount of) supervision. This thesis studies two major clustering paradigms: density-based and semi-supervised clustering. Density-based clustering algorithms seek partitions with high-density areas of points (clusters that are not necessarily globular) separated by low-density areas that may contain noise objects. Semi-supervised clustering algorithms use a small amount of information about data to guide the clustering task.

In the context of density-based clustering, we study (a) the validation of density-based clustering and (b) hierarchical density-based clustering.

The validation of density-based clustering, i.e., the objective and quantitative assessment of clustering results, is one of the most challenging aspects of clustering. Numerous different relative validity criteria have been proposed for the validation of globular clusters. Not all data, however, are composed of globular clusters. We propose a relative density-based validation index, DBCV, that assesses the quality of an arbitrarily-shaped clustering based on the relative density connection between pairs of objects. Our index is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within- and between-cluster density connectedness of clustering results.

In addition to the DBCV, we make several major contributions in the area of hierarchical density-based clustering. We improve on the AUTO-HDS framework for automated clustering and visualization of biological data sets by removing a parameter thereby making the cluster extraction stage simpler and more accurate. We also propose a theoretically and practically improved general hierarchical density-based clustering, called GHDBSCAN, which generalizes the density-based clustering by recognizing its essential components and based on this generalization we propose two algorithms, GHDBSCAN(NMRD) and GHDBSCAN(NMRD+PF), which improve over previous state-of-the-art methods both theoretically and practically.

Regarding semi-supervised clustering, we use the knowledge available about a dataset in the form of constraints to guide the clustering algorithm. In this context, we provide two approaches for model selection that allow the user to select the best model based on few constraints and/or the DBCV value and also discuss a framework for extracting a partitional clustering from a hierarchical clustering tree.
Subjects / Keywords
Graduation date

Fall 2014
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3D679
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Sander, Jörg (Computing Science)
Examining committee members and their departments
- Zaïane, Osmar R. (Computing Science)
- Greiner , Russell (Computing Science)
- Campello, Ricardo J.G.B. (Computer Sciences)
- Spiliopoulou, Myra (Computer Science)