Usage
  • 27 views
  • 57 downloads

Extracting Relational Facts from Plain Text and Building Knowledge Graph Out of Them

  • Author / Creator
    Parniani,Mohammad Sahand
  • Graph-based Knowledge Bases (KBs) are composed of relational facts that can be perceived as two entities, called head and tail, linked through a relation. Processes of constructing KBs, i.e., populating them with such facts, as well as revising and updating them are of special importance. Such tasks require automatic methods and procedures, especially in the case when the main sources of facts are textual documents. This research aims at applying Machine Learning and Computational Intelligence methods for the analysis of textual data and recommending a methodology for extracting structured information from unstructured text. The goal is to design and propose a method to extract triples from sentences in the form of . These extracted triples from the text can be used to build a graph-based KBs or update the existing ones. For the first part of this research, a task of Relation Extraction (RE), i.e., predicting a relation that links two entities mentioned in a sentence, is investigated. Using
    RE processes, new relational facts from unstructured texts should be extracted. In this part, we develop a new method for RE which is based on cleaning the input sequence that is fed to the model. This is obtained by removing noisy tokens from the sentence using dependency tree. This helps the model to focus more on the tokens that contribute more to identifying the relation. We also utilize entity type information and inject that to the model to get a better performance. Our method is tested on the widely used NYT dataset and compared to other state of-the-art methods in RE. Experimental results prove the effectiveness of the developed procedure compared to other methods. For the second part of this research, we focus on a triple extraction task. The main difference between triple extraction and relation extraction is that in a triple extraction process entities are not identified and they should be extracted from sentences along with relations between them. The main goal of triple extraction task is to convert unstructured text to a structured representation in the from of . For this task, we developed a sequence to sequence model based on transformers, to generate the triples. The model encompass encoder and decoder which are initialized using publicly available checkpoints from other transformer models. We compared our model with other state-of-the-art models and showed that our approach achieves great results in generating triples from the input sequence. Finally, we propose a procedure to create a knowledge graph from extracted triples. We take the extracted triples from the WebNLG dataset and build a weighted knowledge graph out of them.

  • Subjects / Keywords
  • Graduation date
    Fall 2022
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-6mcn-b038
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.