Usage
  • 202 views
  • 345 downloads

Learning Metabolite Tandem Mass Spectra Predictors From Molecular Graph Structure

  • Author / Creator
    Fei Wang
  • In the field of metabolomics, mass spectrometry (MS) is the most widely adopted method for identifying metabolites. Conventionally, metabolite identification involves matching the target mass spectrum against experimentally acquired reference mass spectral libraries. However, the limited coverage of these reference libraries has created a major bottleneck to this approach. In the past few decades, several alternative approaches have been developed to address this issue of limited coverage of experimental MS reference libraries. These include in-silico fragmentation methods, which are capable of generating reference mass spectra from chemical structures, and so can extend existing MS reference libraries with synthetic spectra. While traditional in-silico fragmentation methods rely on hand-crafted rules, many recent approaches use machine learning methods to extract MS fragmentation rules.

    This dissertation extends a state-of-art machine learning process, CompetitiveFragmentation Modeling (CFM-ID), which uses a learned model to simulate the MS fragmentation process that occurs in a tandem mass spectrometer. While CFM-ID is an important step forward from hand-coded rule-based approaches, it still is unable to produce sufficiently accurate MS spectra, therefore it cannot yet be seen as a reliable alternative to laboratory mass spectrometry. My primary research contribution is to extend Competitive Fragmentation Modeling methods by learning parameters from the topological structure of a molecule. In the tandem mass spectrum prediction task, our models showed significant improvement compared to the original CFM-ID models across multiple data sets. Furthermore, we also developed several sampling methods that greatly reduced the computational cost of training the model, yet still surpassed legacy CFM-ID models by a significant margin in spectrum prediction tasks.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-kr6k-qy63
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.