Learning Hierarchies from Knowledge Graphs

  • Author / Creator
    Pietrasik, Marcin
  • Knowledge graphs are data storage structures that rely on principles from graph theory to represent information. Specifically, facts are stored as triples which bring together two entities via a predicate. In a graphical context, these entities are analogous to nodes, and the relations between them are analogous to edges. In recent years, using graph structures to model and store data has been garnering an increasing amount of attention among practitioners in sectors ranging from academia to government to industry. Indeed by some measures, graph database management systems are the fastest growing database type over the past decade and large scale public knowledge bases counting billions of triples have become widely available. The open access to such amounts of graph data has spurred on its use in research related to the Semantic Web, artificial intelligence, and computer science broadly. One field of research which has received less attention is that of learning hierarchies from knowledge graphs. Learning hierarchies from knowledge graphs refers to a broad concept that encompasses distinct tasks related by their induction of hierarchical relations between knowledge graph entities. Perhaps the simplest reason for learning hierarchical structures is that they organize data in a way that is highly intuitive and interpretable to humans. Indeed, the most widely used knowledge bases are organized by hierarchical structures, namely trees and directed acyclic graphs. That is to say, knowledge graphs are hierarchical at their core. With this in mind, this doctoral work proposes novel methods and models for learning hierarchies from knowledge graphs using artificial intelligence. In doing so, it seeks to advance both hierarchy induction as well as the application of hierarchies to common tasks and problems in the knowledge graph community. The first problem confronted is that of class taxonomy induction. In this regard, a novel method using a greedy algorithm based on class frequencies and co-occurrences is proposed. It's shown empirically that this method is capable of inducing well structured taxonomies and that it outperforms existing class taxonomy induction methods. The second problem is that of using hierarchies to improve knowledge graph embeddings. Embeddings are one of the most widely investigated knowledge graph representations and their utility reaches far to downstream tasks such as link prediction and entity classification. With this in mind, a meta-strategy involving hierarchically coarsening a knowledge graph before embedding is proposed. This allows the embedding process to be performed on a smaller graph, thereby reducing computational complexity. It's shown that such a strategy is capable of attaining faster and oftentimes higher quality knowledge graph embeddings. The third problem investigated is that of learning a hierarchical clustering of a knowledge graph's entities. To this end, a class of probabilistic models called stochastic blockmodels are leveraged. First, it is shown that blockmodels are capable of modelling the intricacies of knowledge graphs by proposing a novel model for knowledge graph generation. This model fuses blockmodelling together with neural networks in what is a first in the context of knowledge graphs. Empirical results demonstrate that such an approach yields results comparable to state-of-the-art methods on real world datasets. Afterwards, a fully probabilistic model is proposed for hierarchical clustering of a knowledge graph's entities. The model is presented in a non-parametric and fully probabilistic framework, allowing for flexibility in learning the structure of the hierarchy. Results indicate that such an approach is capable of learning a coherent cluster hierarchy as per quantitative and qualitative evaluation.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.