Some Bioinformatics Studies on SARS-CoV-2

Mitra, Sangita

doi:doi:10.7939/r3-m4k7-3914

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

279 views
453 downloads

Some Bioinformatics Studies on SARS-CoV-2

Author / Creator

Mitra, Sangita
The ongoing COVID-19 pandemic is impacting the lives of billions of people worldwide as well as the medical and socioeconomic systems. The genomic variability of this virus makes it capable of being prevalent in humans around the world for a long time and migrating from one place to another. It requires a detailed study to understand the trend of SARS-CoV-2 as well as its molecular epidemiology, evolutionary models, and phylogenetic analysis. In this dissertation, we perform several bioinformatics studies on coronaviruses and SARS-CoV-2, focusing on their evolution. The time series analysis on the spike proteins, membrane proteins, and envelope proteins mutations of SARS-CoV-2 are performed to understand how they evolve over time. The spike proteins play a vital role in binding with the human ACE2 receptor. The implication, co-occurrence, and recurrence of spike mutations are investigated. D614G mutation increases infection, and we found in implication analysis that 98% of the time, if D614G mutation occurs, 28 other mutations occur in spike proteins. We got several recurrent mutation pairs in spike proteins that appeared periodically. The relationship of SARS-CoV-2 with two previous outbreaks such as SARS-CoV and MERS-CoV in terms of time series of mutations in spike proteins is analyzed. The mutation rate of six variants of interest and variants of concerns is analyzed to understand the number of mutation change over time. We observed that the COVID-19 pandemic follows some time-series patterns and thus applied the forecasting to predict the upcoming mutations. In this perspective, a prominent long-short term memory network (LSTM) like encoder-decoder LSTM model is applied to predict nucleotide mutations and spike proteins mutations at certain positions of SARS-CoV-2. We propose two bootstrapping techniques as statistical tests to evaluate the model’s performance in general and predict each mutation site. The statistical tests show that our model is highly robust in prediction on most sites despite missing data. The results show that the forecasting is more confident in some biologically significant sites than others insignificant sites.
Subjects / Keywords
Graduation date

Fall 2021
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-m4k7-3914
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Lin, Guo-Hui (Computing Science)