Usage
  • 45 views
  • 52 downloads

Semantic Annotation of Mixed-unit Numeric Data

  • Author / Creator
    Khorram Nazari, Amir Behrad
  • A significant portion of quantitative information about entities in open-sourced data and data lakes is presented in tabular format, yet these tables often lack consistent labeling and schema, complicating querying and integration tasks. This thesis addresses the challenge of identifying and annotating numerical columns that may contain data from multiple sources with inconsistent units. For instance, weight measurements might be expressed in kilograms or pounds without clear unit indications. We propose a robust method for annotating mixed-unit numeric data, develop a benchmark for this task, and introduce an algorithm that accurately detects semantic types (e.g., height) and links them to corresponding types in a knowledge graph. Our method outperforms state-of-the-art techniques, particularly in detecting mixed units and assigning appropriate semantic labels. Our evaluation of mixed-unit columns with varying levels of complexity confirms the effectiveness of our approach in improving annotation accuracy. Additionally, our evaluation provides new insights into the accuracy of annotating mixed-unit columns, a problem that has not been thoroughly explored in previous work.

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-wy6f-ez63
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.