Usage
  • 92 views
  • 110 downloads

Column Type Annotation Using Large Language Models

  • Author / Creator
    Babamahmoudi, Amir
  • A vast amount of information on the web is stored in tabular format, making accurate table interpretation crucial for data analysis and knowledge extraction. Column Type Annotation (CTA), the process of assigning semantic types to table columns, is essential for effective table querying and understanding.

    This thesis investigates the CTA task in two parts. First, we conduct a critical evaluation of established CTA benchmarks, identifying major issues that
    impact the performance of the models on these benchmarks. Our findings reveal that addressing these benchmark issues can lead to substantial performance reductions of up to 30\% compared to previously reported results.

    Second, we harness the power of Large Language Models (LLMs) for the CTA task. By employing techniques such as Retrieval-Augmented Generation (RAG) and using models reasoning capabilities, we demonstrate how LLMs can achieve state-of-the-art performance on CTA tasks. Our approach leads to a 10\% improvement over simple prompting methods, making LLMs competitive with, and in some cases surpassing, current leading pre-trained models designed specifically for CTA.

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-ap82-9c37
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.