Usage
  • 169 views
  • 395 downloads

Search Term Selection and Document Clustering for Query Suggestion

  • Author / Creator
    Zhang, Xiaomin
  • In order to improve a user's query and help the user quickly satisfy his/her information need, most search engines provide query suggestions that are meant to be relevant alternatives to the user's query. This thesis builds on the query suggestion system and evaluation methodology described in Shen Jiang's Masters thesis (2008). Jiang's system constructs query suggestions by searching for lexical aliases of web documents and then applying query search to the lexical aliases. A lexical alias for a web document is a list of terms that return the web document in a top-ranked position. Query search is a search process that finds useful combinations of search terms. The main focus of this thesis is to supply alternatives for the components of Jiang's system. We suggest three term scoring mechanisms and generalize Jiang's lexical alias search to be a general search for terms that are useful for constructing good query suggestions. We also replace Jiang's top-down query search
    by a bottom-up beam search method. We experimentally show that our query suggestion method improves Jiang's system by 30% for short queries and 90% for long queries using Jiang's evaluation method. In addition, we add new evidence supporting Jiang's conclusion that terms in the user's initial query terms are important to include in the query suggestions.

    In addition, we explore the usefulness of document clustering in creating query suggestions. Our experimental results are the opposite of what we expected: query suggestion based on clustering does not perform nearly as well, in terms of the "coverage" scores we are using for evaluation, as our best method that is not based on document clustering.

  • Subjects / Keywords
  • Graduation date
    Spring 2011
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/R36D5R
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.