Large-scale Document Understanding with Knowledge Graphs for Medical Applications

Costello, Jeremy

doi:doi:10.7939/r3-b549-9m42

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

190 views
330 downloads

Large-scale Document Understanding with Knowledge Graphs for Medical Applications

Author / Creator

Costello, Jeremy
We introduce the background of the natural language processing field, outlining the benefits and drawbacks of rule-based versus statistical methods. We present knowledge graphs as a way to integrate the explainability of rule-based methods and the power of statistical methods, large language models in particular. The accuracy of natural language processing methods is paramount in sensitive fields such as
biomedicine. We aim to create a knowledge graph to help practitioners, caretakers, and patients affected by neurodevelopmental disorders.
We give a background of knowledge graphs, topic modeling, and reinforcement learning. We talk about what knowledge graphs are, the creation process, and natural language processing methods for extracting data from text to populate a knowledge graph. We give a short history of topic modeling, followed by an outline of latent dirichlet allocation, dynamic topic models, topic model evaluation, and recent advances in neural topic modeling. We explain what reinforcement learning is, and outline the different approaches to reinforcement learning.
We develop a pipeline for creating a knowledge graph on neurodevelopmental disorders. We scrape data from both professional academic sources and non-professional webpages, including finances and services for caretakers and patients affected by neurodevelopmental disorders. We take input from practitioners, caretakers, and patients during the knowledge graph creation process in order to generate a knowledge graph that is as useful as possible for non-professionals, in contrast to many existing medical knowledge graphs that only incorporate academic sources.
To improve the topic modeling aspect of our knowledge graph creation pipeline, we develop a new topic model using reinforcement learning. We make additional improvements to the topic model, including modernizing the neural network architecture, weighting the ELBO loss, and using contextual embeddings. Our unsupervised model outperforms all other unsupervised models and performs on par with or better than most models using supervised labeling. We conduct an ablation study to determine which changes to our model are the most important.
We look to directly extract triples from text using large language models. With the assistance of volunteers, we create two new data sets about FragileX syndrome: one for named-entity recognition and one for relation extraction. We compare a model trained on our FrageileX data set to a model trained on a less specific data set. We find strengths and weaknesses of both models. Our method is likely outdated due to the rapid pace of advancements in large language models.
We give a short concluding statement summarizing what we have done, and provide some brief thoughts on the future of natural language processing for biomedical applications.
Subjects / Keywords
Graduation date

Spring 2024
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-b549-9m42
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Electrical and Computer Engineering
Specialization
- Software Engineering and Intelligent Systems
Supervisor / co-supervisor and their department(s)
- Marek Reformat (Electrical and Computer Engineering)