Usage
  • 156 views
  • 152 downloads

Table Expansion: Populating Relational Web Tables Based on Examples

  • Author / Creator
    Kassenov, Zharkyn
  • The Web contains an enormous amount of structured data in the form of web tables, and there is a great value in retrieving this data and harnessing it for decision making and gain more insights. Finding the right data on the Web and integrating it with the existing data within an organization can be a very time-consuming task. To address this problem, this thesis studies the problem of table expansion where given a query table and a corpus of tables, the goal is to expand the query table with additional rows that are likely to belong to the same table. Given the challenges of querying web tables, our approach relies only on instances in the given query table and not on the schema which may not be present or known for tables in the corpus. It uses projections to split tables into entity-attribute binary relations (sets of key-value pairs) and then leverages co-occurrence statistics to retrieve candidate key-value pairs that are then combined into candidate rows. Our experiments show that constraints required by alternative approaches, such as relying on column labels and contextual information of a web page containing the table, can negatively affect the results and, in some cases, makes them not suitable for the task.

  • Subjects / Keywords
  • Graduation date
    Fall 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-3qaw-ea55
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.