Usage
  • 267 views
  • 271 downloads

Text-to-SQL Systems in the Era of Advanced Large Language Models

  • Author / Creator
    Pourreza, Mohammadreza
  • Text-to-SQL conversion, the process of transforming natural language queries into executable SQL commands, stands at the forefront of bridging human linguistic capabilities with the structured logic of databases. This dissertation embarks on a journey to elevate text-to-SQL systems to new heights, aiming to narrow the performance gap between human expertise and automated systems within the landscape of large language models (LLMs). Our endeavor unfolds in three pivotal stages. Initially, we harness the power of cutting-edge proprietary LLMs such as GPT-4, enhancing their prowess through an in-context learning methodology tailored explicitly for text-to-SQL tasks. Our proposed method is the state-of-the-art Text-to-SQL method which improved upon the previous works by \%5 execution accuracy. Recognizing the critical importance of privacy and the economic considerations tied to proprietary LLMs, we then introduce a decomposed, two-stage supervised fine-tuning approach. This method not only optimizes the efficiency of smaller LLMs but also achieves performance metrics on par with their larger counterparts. Using our proposed two-step method, a small LLMs with 7B parameter can achieve comparable results to GPT-4. Finally, our thorough examination and critique of existing text-to-SQL benchmarks, using human annotation and Standard SQL validation, illuminates the path for future research, highlighting the necessity for more comprehensive and accurate evaluation frameworks. Our analysis of the current Text-to-SQL benchmarks reveals critical limitations, which can hinder further advancement in this domain. By proposing methodologies in the realm of LLMs and shedding light on areas ripe for further advancement, this thesis aspires to inch closer to the elusive goal of achieving human-level proficiency in Text-to-SQL translation.

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-pyr1-f354
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.