Speedup Clustering with Hierarchical Ranking

  • Author(s) / Creator(s)
  • Technical report TR08-09. Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. We also propose two techniques to significantly reduce the overhead of ranking: 1) a frontier search rather than a sequential scan in the naïve ranking to reduce the search space; 2) based on this exact search, an approximate frontier search for pairwise ranking that further reduces the runtime. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed. | TRID-ID TR08-09

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International