Usage
  • 231 views
  • 187 downloads

Extending Tables using a Web Table Corpus

  • Author / Creator
    Sarabchi, Saeed
  • The web contains a large volume of tables that provide structured information about entities and relationships. This data may be used as a source for exploratory searches and to gather information about desired entities. This thesis focuses on one particular exploratory search where given a query table and a corpus of web tables, the goal is to find a ranked list of additional columns (from the table corpus) that describe the entities of the query table. We refer to this task as “table extension.”

    There are challenges in performing a table extension. A main challenge is that in the absence of schema information for web tables, it is not often clear which tables and/or columns may be relevant to the query. Also, multiple related columns may represent the same concept and this can lead to duplicate columns in the extended table. In this thesis, we propose a 5-step framework to address these challenges. Our framework establishes functional dependency relationships between columns and uses those dependencies in identifying more appropriate extensions. Duplicate columns are also detected and consolidated through some form of clustering. We evaluate our framework on a publicly available gold standard containing 233 web tables, using DBpedia as ground truth. Our evaluation reveals that the number of unique relevant columns extended by our proposed solution is on average 3 times more than that of two state-of-the-art baselines. Furthermore, the precision of extending a table using our method is higher than that of both baselines, meaning that fewer irrelevant columns are retrieved.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-n28t-a104
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.