Usage
  • 229 views
  • 420 downloads

Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs

  • Author / Creator
    Macdonald, Erin
  • Knowledge graphs are an important source of information used in a number of applications including web search, online shopping, social networking, and chatbots. They are an effective way of storing real-world data in a machine-readable format. As a result, the construction of comprehensive, trustworthy knowledge graphs has been a well-researched problem. We present a method for adding new facts to an existing knowledge graph using Wikipedia tables as a source of information. Previous work has primarily focused on extracting facts from text, ignoring the information available in tables.
    We use an existing knowledge graph to annotate a set of Wikipedia tables using distant supervision with relations between pairs of columns. Then, we run a classifier on these tables to remove as many tables brought in by error as possible. We also create queries based on table formats identified as indicative of certain relations to increase the number of tables collected. In total, we annotate over 200,000 relational tables with these methods.
    We then train a long short-term memory (LSTM) network using these tables to predict a relation given a table and pair of columns. We perform an ablation study to identify what features are weighted most heavily and provide the most information to the LSTM. We also explore how two different state-of-the-art word embedding sets fare. Our experiments show that our system is able to correctly predict which relation a pair of columns represents with over 87% accuracy. We compare our results with two other relation prediction systems which use different datasets of tables and show that our method achieves higher accuracy, though a more direct comparison can not be performed.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-qgqh-vb79
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.