Semantic Annotation of Numerical Data in Web Tables

  • Author / Creator
    Su, Yuchen
  • A large portion of quantitative information about entities mentioned in Web pages is expressed as Web tables, and these tables often lack proper schema and annotation, which introduces challenges for the purpose of querying and further analysis. In this thesis, we study the problem of annotating the numerical columns of Web tables by linking them to properties in a knowledge graph.

    Unlike some approaches in the literature that use contextual information (such as column headers and captions), which can be missing or not reliable, or labeled data for model training, which can be difficult to obtain, our approach relies only on the semantic information readily available in knowledge graphs. We show that our approach can reliably detect both semantic types (e.g., height) and unit labels (e.g., centimeters) when the semantic type is present in the knowledge graph.

    Our evaluation on real-world web tables data shows that our method outperforms, in terms of precision and F1 score, some of the state-of-the-art approaches on semantic labeling. Our evaluation also gives an insight of precision on unit detection given that no previous works have explored the similar problem to the best of our knowledge.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.