Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs

Macdonald, Erin

doi:doi:10.7939/r3-qgqh-vb79

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

308 views
508 downloads

Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs

Author / Creator

Macdonald, Erin
Knowledge graphs are an important source of information used in a number of applications including web search, online shopping, social networking, and chatbots. They are an effective way of storing real-world data in a machine-readable format. As a result, the construction of comprehensive, trustworthy knowledge graphs has been a well-researched problem. We present a method for adding new facts to an existing knowledge graph using Wikipedia tables as a source of information. Previous work has primarily focused on extracting facts from text, ignoring the information available in tables.
We use an existing knowledge graph to annotate a set of Wikipedia tables using distant supervision with relations between pairs of columns. Then, we run a classifier on these tables to remove as many tables brought in by error as possible. We also create queries based on table formats identified as indicative of certain relations to increase the number of tables collected. In total, we annotate over 200,000 relational tables with these methods.
We then train a long short-term memory (LSTM) network using these tables to predict a relation given a table and pair of columns. We perform an ablation study to identify what features are weighted most heavily and provide the most information to the LSTM. We also explore how two different state-of-the-art word embedding sets fare. Our experiments show that our system is able to correctly predict which relation a pair of columns represents with over 87% accuracy. We compare our results with two other relation prediction systems which use different datasets of tables and show that our method achieves higher accuracy, though a more direct comparison can not be performed.
Subjects / Keywords
Graduation date

Spring 2020
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-qgqh-vb79
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Barbosa, Denilson (Computing Science)