Constructing Knowledge Graphs with Language Models and Learning Hierarchies from Graphs using Probabilistic Topic Modeling

Zhang, Yujia

doi:doi:10.7939/r3-m3fa-tx69

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

130 views
401 downloads

Constructing Knowledge Graphs with Language Models and Learning Hierarchies from Graphs using Probabilistic Topic Modeling

Author / Creator

Zhang, Yujia
Knowledge graphs leverage a data model structured as a graph or topology to represent and manipulate data. Knowledge graphs, abbreviated as KGs, consist of interconnected factual statements, conceptualized as distinct entities referred to as the {\em subject} and {\em object}, linked by a specified relation known as the {\em predicate}. These graphs find applications in recommendation systems, logical reasoning, and question-answering mechanisms. They empower machines to comprehend the relationships between different entities and draw conclusions based on the structured information they encompass. Constructing, revising, and augmenting such KGs warrants particular scholarly attention.

KG construction is fundamental to organizing and representing structured knowledge from unstructured text data. The KGs can be constructed more effectively with advanced language models with substantial computational capabilities. The models' effectiveness lies in understanding textual data, extracting facts, and synthesizing the content. Our study focuses on evaluating the capacity of these models to identify entities and relationships that contain contextual semantics. Through the utilization of these capabilities, the quality and comprehensiveness of KGs can be improved. Moreover, incorporating sophisticated methods such as transformers and their fine-tuning enables these models to adapt to specific domains, consequently enhancing the relevance and accuracy of the extracted knowledge.

The hierarchical analysis of knowledge graphs (KGs) is instrumental in uncovering the latent structures inherent in knowledge base data. Drawing inspiration from probabilistic topic modeling, which analyzes text corpora by identifying latent topics that represent the underlying themes and content patterns in documents, our research aims to adapt and extend these analytical frameworks for the hierarchical exploration of KGs. Specifically, models are introduced within a nonparametric and probabilistic context, offering adaptability in comprehending the arrangement of the hierarchy. We have adapted the Hierarchical Latent Dirichlet Allocation algorithm and the Nested Hierarchical Dirichlet Process to construct the models.
We evaluate these models quantitatively and qualitatively by analyzing the topics and distributions of words that define the hierarchical structure of complex KGs. By doing so, we aim to enhance our understanding of the intricate connections and dependencies within KGs, facilitating more robust and scalable knowledge representation. Furthermore, our research seeks to identify potential improvements in the algorithms used for hierarchical analysis, ultimately contributing to more efficient methods for managing and utilizing large-scale knowledge bases. This approach provides deeper insight into the structural dynamics of KGs and paves the way for semantic search, ontology development, and automated reasoning.
Subjects / Keywords
Graduation date

Fall 2024
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-m3fa-tx69
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Electrical and Computer Engineering
Specialization
- Software Engineering and Intelligent Systems
Supervisor / co-supervisor and their department(s)
- Marek, Reformat(Electrical and Computer Engineering)