Extracting Relational Facts from Plain Text and Building Knowledge Graph Out of Them

Parniani,Mohammad Sahand

doi:doi:10.7939/r3-6mcn-b038

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

40 views
95 downloads

Extracting Relational Facts from Plain Text and Building Knowledge Graph Out of Them

Author / Creator

Parniani,Mohammad Sahand
Graph-based Knowledge Bases (KBs) are composed of relational facts that can be perceived as two entities, called head and tail, linked through a relation. Processes of constructing KBs, i.e., populating them with such facts, as well as revising and updating them are of special importance. Such tasks require automatic methods and procedures, especially in the case when the main sources of facts are textual documents. This research aims at applying Machine Learning and Computational Intelligence methods for the analysis of textual data and recommending a methodology for extracting structured information from unstructured text. The goal is to design and propose a method to extract triples from sentences in the form of . These extracted triples from the text can be used to build a graph-based KBs or update the existing ones. For the first part of this research, a task of Relation Extraction (RE), i.e., predicting a relation that links two entities mentioned in a sentence, is investigated. Using
RE processes, new relational facts from unstructured texts should be extracted. In this part, we develop a new method for RE which is based on cleaning the input sequence that is fed to the model. This is obtained by removing noisy tokens from the sentence using dependency tree. This helps the model to focus more on the tokens that contribute more to identifying the relation. We also utilize entity type information and inject that to the model to get a better performance. Our method is tested on the widely used NYT dataset and compared to other state of-the-art methods in RE. Experimental results prove the effectiveness of the developed procedure compared to other methods. For the second part of this research, we focus on a triple extraction task. The main difference between triple extraction and relation extraction is that in a triple extraction process entities are not identified and they should be extracted from sentences along with relations between them. The main goal of triple extraction task is to convert unstructured text to a structured representation in the from of . For this task, we developed a sequence to sequence model based on transformers, to generate the triples. The model encompass encoder and decoder which are initialized using publicly available checkpoints from other transformer models. We compared our model with other state-of-the-art models and showed that our approach achieves great results in generating triples from the input sequence. Finally, we propose a procedure to create a knowledge graph from extracted triples. We take the extracted triples from the WebNLG dataset and build a weighted knowledge graph out of them.
Subjects / Keywords
Graduation date

Fall 2022
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-6mcn-b038
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Electrical and Computer Engineering
Specialization
- Software Engineering and Intelligent Systems
Supervisor / co-supervisor and their department(s)
- Reformat, Marek (Electrical and Computer Engineering)