- 122 views
- 95 downloads
Speedup Clustering with Hierarchical Ranking
-
- Author(s) / Creator(s)
-
Technical report TR08-09. Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. We also propose two techniques to significantly reduce the overhead of ranking: 1) a frontier search rather than a sequential scan in the naïve ranking to reduce the search space; 2) based on this exact search, an approximate frontier search for pairwise ranking that further reduces the runtime. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed. | TRID-ID TR08-09
-
- Date created
- 2008
-
- Subjects / Keywords
-
- Type of Item
- Report
-
- License
- Attribution 3.0 International