Metric Techniques for High-Dimensional Indexing

  • Author(s) / Creator(s)
  • Technical report TR04-19. Despite the proposal of numerous tree-based structures for high-dimensional similarity searches, techniques based on a sequential scan, such as the VA-File, have been shown to be quite effective. In this thesis we present three new access structures which use sequential access patterns to efficiently answer similarity queries for high-dimensional vector and metric data. Two of these access structures are designed to answer range queries, while the third access method is intended for nearest neighbor queries. The three access methods organize preprocessed data and reorder the original data sequentially on disk. At query time, portions of the data are read sequentially to prevent expensive random disk I/Os that prevent many access methods for high-dimensional data and metric data from being efficient. Experimental results show the first two proposed access structures can process range queries up to 24 times faster than the VA-File and up to 69 times faster than a sequential scan of the data set. The third proposed method is designed for nearest neighbor queries and is up to 15 times faster than the VA-File method and more than 40 times faster than a sequential scan of the data set. Unlike the VA-File, the access structures presented in this thesis work in general metric spaces, in addition to vector space (under a metric distance), scale well with increasing the radius of range queries and the number of nearest neighbors retrieved for nearest neighbor queries, and can be easily implemented. | TRID-ID TR04-19

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International