Usage
  • 248 views
  • 437 downloads

Enhancing Search Query Understanding with Deep Learning

  • Author / Creator
    Han, Xuefei
  • Search query understanding is a trending topic in the field of Information Retrieval (IR).
    The goal is to learn higher-level representations for the intents or concepts behind a search query and utilize these representations to further enhance down-stream services like content recommendation.
    There are several challenges associated with search query understanding.
    First of which is the Lexical Chase problem, where the surrounding context of a query could not be accurately established by considering only the words in the query.
    Second, we need an efficient way to build context representations for search queries in the open-domain, which could encapsulate massive amounts of entities and knowledge.
    Third, we must ensure that down-streams tasks are indeed benefiting from these representations.

    The rapid advancements of deep machine learning models introduce new possibilities for tackling these challenges. In this thesis, we begin by investigating whether word-by-word deep generative models provide a unique yet feasible alternative approach for enhancing the process of query understanding and recommendation. We first attempt to directly generate search queries from long news documents, which is of great value to search engines and recommenders in terms of locating potential target users and ranking content. By combining a hierarchical Recurrent Neural Network (RNN) encoder with a sentence-level and a keyword-level Graph Convolutional Networks (GCNs), we build structural document representations. A Transformer based decoder incorporates each feature stream through the Multi-Head Attention mechanism.

    Next, we study generative query recommendation from short inputs, e.g. queries and document titles. We partition the task of query generation into two simpler sub-problems, namely, relevant words discovery and context-aware query generation.
    In the first stage, an RNN-based Relevant Words Generator shortlists a dynamic vocabulary of contextually relevant words, which eases the learning process for the attentional Sequence-to-Sequence (Seq2Seq) model in the second stage. Overall, our proposed framework achieves better performance and alleviates the issue of high resource-consumption in many generative language models.

    Finally, we study the problem of relatedness matching between a search query and a large set of high-level concepts.
    We re-adopt the Relevant Words Generator from previous work as an enhanced shortlisting scheme and meta-fine-tune a BERT matching model for fine-grained relatedness classification. By employing four closely related tasks and training under the Reptile algorithm, we achieve zero-shot transfer learning on the problem of query-concept matching.

    On real-world datasets provided by our industry research partner, Tencent, we show that deep learning models learn better representations for search queries, and our approaches competitively outperform many popular baselines. Furthermore, we conduct various ablation tests and case studies to verify the usefulness of each proposed component.

  • Subjects / Keywords
  • Graduation date
    Fall 2019
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-r6mk-n194
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.