This is a decommissioned version of ERA which is running to enable completion of migration processes. All new collections and items and all edits to existing items should go to our new ERA instance at https://ualberta.scholaris.ca - Please contact us at erahelp@ualberta.ca for assistance!
- 208 views
- 225 downloads
Speedup Clustering with Hierarchical Ranking
-
- Author(s) / Creator(s)
-
Technical report TR08-09. Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. We also propose two techniques to significantly reduce the overhead of ranking: 1) a frontier search rather than a sequential scan in the naïve ranking to reduce the search space; 2) based on this exact search, an approximate frontier search for pairwise ranking that further reduces the runtime. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed. | TRID-ID TR08-09
-
- Date created
- 2008
-
- Subjects / Keywords
-
- Type of Item
- Report
-
- License
- Attribution 3.0 International