Download the full-sized PDF of Competitive Fragmentation Modeling of Mass Spectra for Metabolite IdentificationDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Competitive Fragmentation Modeling of Mass Spectra for Metabolite Identification Open Access


Other title
metabolite identification
machine learning
mass spectrometry
Type of item
Degree grantor
University of Alberta
Author or creator
Allen, Felicity R
Supervisor and department
Greiner, Russell (Computing Science)
Examining committee member and department
Harynuk, James (Chemistry)
Wishart, David (Computing Science)
Greiner, Russell (Computing Science)
Schuurmans, Dale (Computing Science)
Neumann, Steffen (Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry – IPB Halle)
Department of Computing Science

Date accepted
Graduation date
Doctor of Philosophy
Degree level
One of the key obstacles to the effective use of mass spectrometry (MS) in high throughput metabolomics is the difficulty in interpreting measured spectra to accurately and efficiently identify metabolites. Traditional methods for automated metabolite identification compare the target MS spectrum to spectra of known molecules in a reference database, ranking candidate molecules based on the closeness of the spectral match. However the limited coverage of available databases has led to interest in computational methods for generating accurate reference MS spectra from chemical structures. This is the target application for this work. My main research contribution is to propose a method for spectrum prediction, which we call Competitive Fragmentation Modeling (CFM). I demonstrate that this method works effectively for both electron ionization (EI)-MS and electrospray tandem MS (ESI-MS/MS). It uses a probabilistic generative model for the fragmentation processes occurring in a mass spectrometer, and a machine learning approach to learn parameters for this model from data. CFM has been used in both a spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target spectrum). In the spectrum prediction task, CFM showed improved performance when compared to a full enumeration of all peaks corresponding to all substructures of the molecule. In the metabolite identification task, CFM obtained substantially better rankings for the correct candidate than existing methods. As further validation, this method won the structure identification category of the international Critical Assessment of Small Molecule Identification (CASMI) 2014 competition. The method is also available for general use via a web interface.
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication
Allen F., Pon A., Wilson M., Greiner R., Wishart D., "CFM-ID: A web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra", Nucleic Acids Research, 42 (W1): W94-99, 2014.

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 2281850
Last modified: 2016:06:16 17:12:54-06:00
Filename: Allen_Felicity_R_201601_PhD.pdf
Original checksum: 6f763bd81a466dbb34765a50ca1e8ef4
Well formed: true
Valid: true
File title: Abstract
File title: Competitive Fragmentation Modeling of Mass Spectra for Metabolite Identification
File author: © Felicity Allen, University of Alberta, Put data here
File author: Felicity Allen, University of Alberta, Put data here
Page count: 125
Activity of users you follow
User Activity Date