Speedup Clustering with Hierarchical Ranking

Sander, Joerg; Zhou, Jianjun

doi:doi:10.7939/R3MC8RH41

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Technical Reports (Computing Science)

Usage

219 views
235 downloads

Speedup Clustering with Hierarchical Ranking

Author(s) / Creator(s)
- Sander, Joerg
- Zhou, Jianjun
Technical report TR08-09. Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. We also propose two techniques to significantly reduce the overhead of ranking: 1) a frontier search rather than a sequential scan in the naïve ranking to reduce the search space; 2) based on this exact search, an approximate frontier search for pairwise ranking that further reduces the runtime. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed. | TRID-ID TR08-09
Date created

2008
Subjects / Keywords
- Hierarchical ranking
- Clustering algorithms
Type of Item

Report
DOI

https://doi.org/10.7939/R3MC8RH41
License

Attribution 3.0 International

Language
- English