ERA

Download the full-sized PDF of A New Method For Semi-Supervised Density-Based Projected ClusteringDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3VH5CZ0S

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

A New Method For Semi-Supervised Density-Based Projected Clustering Open Access

Descriptions

Other title
Subject/Keyword
subspace
clustering
semi-supervised
projected
density-based
kdd
constraints
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
Jullion, Zachary M
Supervisor and department
Sander, Joerg (Computing Science)
Examining committee member and department
Sander, Joerg (Computing Science)
Campello, Ricardo (Computing Science)
Nascimento, Mario (Computing Science)
Department
Department of Computing Science
Specialization

Date accepted
2017-09-29T11:32:13Z
Graduation date
2017-11:Fall 2017
Degree
Master of Science
Degree level
Master's
Abstract
Density-based clustering methods extract high density clusters which are separated by regions of lower density. HDBSCAN* is an existing algorithm for producing a density-based cluster hierarchy. To obtain clusters from this hierarchy it includes an instance of FOSC(Framework for Optimal Selection of Clusters) to extract significant clusters, based on a measure known as cluster stability. We introduce CASAR (Compact And Separation Adjusted Ratio), a new algorithm for extracting significant clusters from an HDBSCAN* hierarchy. CASAR issimilar to FOSC, but defines local cluster quality differently and also uses a different aggregation method for comparing the quality of descendant clusters to ancestors in the hierarchy. The local cluster quality that CASAR uses is based on the validation index DBCV (Density-Based Cluster Validation). CASAR is designed to extract individual density-based clusters from subspaces, and is not meant to be a general purpose replacement for cluster stability. We also introduce a new semi-supervised density-based method for finding relevant subspaces. Given a set of should-link objects that belong to an undiscovered cluster, our method finds an appropriate set of attributes for extracting the cluster. Our method makes use of well-established qualities of density-based clusters, and as such, it can be used as a pre-processing step for a wide variety of different density-based clustering algorithms. We combine this method with HDBSCAN* and CASAR to produce a semi-supervised density-based projected clustering algorithm. In a series of experiments, we compare CASAR and cluster stability on both synthetic data and on real data sets. We also compare our semi-supervised density-based projected clustering algorithm to an existing semi-supervised projected clustering algorithm and to a well-known unsupervised projected clustering algorithm. We conclude this thesis with a summary of the strengths and weaknesses of our method, a summary of experimental findings, and a discussion about possible directions for future work.
Language
English
DOI
doi:10.7939/R3VH5CZ0S
Rights
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication

File Details

Date Uploaded
Date Modified
2017-09-29T17:32:13.836+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 1184071
Last modified: 2017:11:08 16:57:54-07:00
Filename: Jullion_Zachary_M_201709_MSc.pdf
Original checksum: bc8f9acb052a0c2204ef67a2e0bf35c1
Well formed: true
Valid: true
File title: Abstract
Page count: 105
File language: en-CA
Activity of users you follow
User Activity Date