Usage
  • 187 views
  • 233 downloads

Top-K Spatial Term Queries on Streaming Data

  • Author / Creator
    Sara Farazi
  • With rapidly increasing user-generated geo-tagged content in social media, location-based queries have received more attention lately. There has been extensive work on finding top-K frequent terms in specific locations from social network data streams. However, the problem reverse spatial term queries, where given a term, one wants to find the top locations where the term is frequent, has not been much studied. We study the problem of efficiently answering two types of queries on streaming data: 1) Top-k reverse frequent spatial queries, where given a term, the goal is to find top k locations where the term is frequent, and 2) Term frequency spatial queries, which is finding the expected frequency of a term in a given location. We explore a probabilistic model of term distribution that allows us to estimate the frequency of locations that are not kept in a stream sketch or summary.
    We study the back-and-forth relationship between the efficiency of queries, the efficiency of updates and the accuracy of the results and identify some sweet spots where both efficient and effective algorithms can be developed. We demonstrate that our method can also be extended to support multi-term queries. Finally, we conduct experiments on a relatively large collection of both geo-tagged tweets and geo-tagged Flickr photos to evaluate our algorithms.
    The evaluation reveals that our proposed method achieves a high accuracy when only a limited amount of memory is given. Also, the query time can be improved, compared to a baseline, by 2-3 orders of magnitude without much loss in accuracy and that the update time can further be improved by at least an order of magnitude under some term distributions or update strategies.

  • Subjects / Keywords
  • Graduation date
    Fall 2018
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/R3KH0FF5X
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.