Usage
  • 59 views
  • 123 downloads

Constructing Knowledge Graphs with Language Models and Learning Hierarchies from Graphs using Probabilistic Topic Modeling

  • Author / Creator
    Zhang, Yujia
  • Knowledge graphs leverage a data model structured as a graph or topology to represent and manipulate data. Knowledge graphs, abbreviated as KGs, consist of interconnected factual statements, conceptualized as distinct entities referred to as the {\em subject} and {\em object}, linked by a specified relation known as the {\em predicate}. These graphs find applications in recommendation systems, logical reasoning, and question-answering mechanisms. They empower machines to comprehend the relationships between different entities and draw conclusions based on the structured information they encompass. Constructing, revising, and augmenting such KGs warrants particular scholarly attention.

    KG construction is fundamental to organizing and representing structured knowledge from unstructured text data. The KGs can be constructed more effectively with advanced language models with substantial computational capabilities. The models' effectiveness lies in understanding textual data, extracting facts, and synthesizing the content. Our study focuses on evaluating the capacity of these models to identify entities and relationships that contain contextual semantics. Through the utilization of these capabilities, the quality and comprehensiveness of KGs can be improved. Moreover, incorporating sophisticated methods such as transformers and their fine-tuning enables these models to adapt to specific domains, consequently enhancing the relevance and accuracy of the extracted knowledge.

    The hierarchical analysis of knowledge graphs (KGs) is instrumental in uncovering the latent structures inherent in knowledge base data. Drawing inspiration from probabilistic topic modeling, which analyzes text corpora by identifying latent topics that represent the underlying themes and content patterns in documents, our research aims to adapt and extend these analytical frameworks for the hierarchical exploration of KGs. Specifically, models are introduced within a nonparametric and probabilistic context, offering adaptability in comprehending the arrangement of the hierarchy. We have adapted the Hierarchical Latent Dirichlet Allocation algorithm and the Nested Hierarchical Dirichlet Process to construct the models.
    We evaluate these models quantitatively and qualitatively by analyzing the topics and distributions of words that define the hierarchical structure of complex KGs. By doing so, we aim to enhance our understanding of the intricate connections and dependencies within KGs, facilitating more robust and scalable knowledge representation. Furthermore, our research seeks to identify potential improvements in the algorithms used for hierarchical analysis, ultimately contributing to more efficient methods for managing and utilizing large-scale knowledge bases. This approach provides deeper insight into the structural dynamics of KGs and paves the way for semantic search, ontology development, and automated reasoning.

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-m3fa-tx69
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.