Competitive Fragmentation Modeling of Mass Spectra for Metabolite Identification

  • Author / Creator
    Allen, Felicity R
  • One of the key obstacles to the effective use of mass spectrometry (MS) in high throughput metabolomics is the difficulty in interpreting measured spectra to accurately and efficiently identify metabolites. Traditional methods for automated metabolite identification compare the target MS spectrum to spectra of known molecules in a reference database, ranking candidate molecules based on the closeness of the spectral match. However the limited coverage of available databases has led to interest in computational methods for generating accurate reference MS spectra from chemical structures. This is the target application for this work. My main research contribution is to propose a method for spectrum prediction, which we call Competitive Fragmentation Modeling (CFM). I demonstrate that this method works effectively for both electron ionization (EI)-MS and electrospray tandem MS (ESI-MS/MS). It uses a probabilistic generative model for the fragmentation processes occurring in a mass spectrometer, and a machine learning approach to learn parameters for this model from data. CFM has been used in both a spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target spectrum). In the spectrum prediction task, CFM showed improved performance when compared to a full enumeration of all peaks corresponding to all substructures of the molecule. In the metabolite identification task, CFM obtained substantially better rankings for the correct candidate than existing methods. As further validation, this method won the structure identification category of the international Critical Assessment of Small Molecule Identification (CASMI) 2014 competition. The method is also available for general use via a web interface.

  • Subjects / Keywords
  • Graduation date
    Spring 2016
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Schuurmans, Dale (Computing Science)
    • Neumann, Steffen (Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry – IPB Halle)
    • Wishart, David (Computing Science)
    • Greiner, Russell (Computing Science)
    • Harynuk, James (Chemistry)