Enhancing Search Query Understanding with Deep Learning

Han, Xuefei

doi:doi:10.7939/r3-r6mk-n194

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

316 views
516 downloads

Enhancing Search Query Understanding with Deep Learning

Author / Creator

Han, Xuefei
Search query understanding is a trending topic in the field of Information Retrieval (IR).
The goal is to learn higher-level representations for the intents or concepts behind a search query and utilize these representations to further enhance down-stream services like content recommendation.
There are several challenges associated with search query understanding.
First of which is the Lexical Chase problem, where the surrounding context of a query could not be accurately established by considering only the words in the query.
Second, we need an efficient way to build context representations for search queries in the open-domain, which could encapsulate massive amounts of entities and knowledge.
Third, we must ensure that down-streams tasks are indeed benefiting from these representations.

The rapid advancements of deep machine learning models introduce new possibilities for tackling these challenges. In this thesis, we begin by investigating whether word-by-word deep generative models provide a unique yet feasible alternative approach for enhancing the process of query understanding and recommendation. We first attempt to directly generate search queries from long news documents, which is of great value to search engines and recommenders in terms of locating potential target users and ranking content. By combining a hierarchical Recurrent Neural Network (RNN) encoder with a sentence-level and a keyword-level Graph Convolutional Networks (GCNs), we build structural document representations. A Transformer based decoder incorporates each feature stream through the Multi-Head Attention mechanism.

Next, we study generative query recommendation from short inputs, e.g. queries and document titles. We partition the task of query generation into two simpler sub-problems, namely, relevant words discovery and context-aware query generation.
In the first stage, an RNN-based Relevant Words Generator shortlists a dynamic vocabulary of contextually relevant words, which eases the learning process for the attentional Sequence-to-Sequence (Seq2Seq) model in the second stage. Overall, our proposed framework achieves better performance and alleviates the issue of high resource-consumption in many generative language models.

Finally, we study the problem of relatedness matching between a search query and a large set of high-level concepts.
We re-adopt the Relevant Words Generator from previous work as an enhanced shortlisting scheme and meta-fine-tune a BERT matching model for fine-grained relatedness classification. By employing four closely related tasks and training under the Reptile algorithm, we achieve zero-shot transfer learning on the problem of query-concept matching.

On real-world datasets provided by our industry research partner, Tencent, we show that deep learning models learn better representations for search queries, and our approaches competitively outperform many popular baselines. Furthermore, we conduct various ablation tests and case studies to verify the usefulness of each proposed component.
Subjects / Keywords
Graduation date

Fall 2019
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-r6mk-n194
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Electrical and Computer Engineering
Specialization
- Software Engineering & Intelligent Systems
Supervisor / co-supervisor and their department(s)
- Niu, Di (Electrical and Computer Engineering)