Method Development for High-Coverage Metabolome Analysis

  • Author / Creator
    Li, Hao
  • Metabolomics, as an interdisciplinary research area, requires the integration of knowledge from different disciplines, such as analytical chemistry, bioinformatics, and statistics. The general workflow of metabolome analysis includes sample preparation, data acquisition, data processing, data analysis, and metabolite identification. The improvement on any step listed above can affect the outcome in metabolomics studies. In the experimental section, tremendous efforts have been made to expand the metabolome coverage and produce high-quality data. The bottleneck gradually shifts to the section of data interpretation, such as metabolite identification, especially for unknown compounds, and appropriate methods used in data analysis.
    In response to these challenges, my thesis research focuses on the improvement of the data interpretation section. In the first part, the coverage of the combination of multi-channel chemical isotope labeling (CIL) methods was evaluated (Chapter 2). Metabolite information from current metabolomics databases was extracted. Based on the functional groups, metabolites were further classified into four sub-metabolomes corresponding to four CIL channels, including amine/phenol channel, carboxylic channel, carbonyl channel, and hydroxyl channel. From the perspective of chemical functional groups or chemical space, near-complete metabolome coverage could be achieved using the integration of the four sub-metabolomes.
    The second part targeted putative metabolite identification. Instead of building in-house libraries through data acquisition of metabolite standards, in silico prediction using existing data was employed. In Chapter 3, to refine the tripeptide identification via exact mass match only, the retention time (RT) of chemical isotope labeled tripeptides was predicted based on the RT of labeled dipeptides. In Chapter 4, MCID 2.0, an evidence-based metabolome library, was constructed using 76 biological reactions. To facilitate the identification of unknown metabolites, theoretical metabolites were predicted based on the metabolites from the KEGG compounds database.
    Lastly, a biomarker discovery study on spinal cord injury was conducted using serum samples from human clinical trials, aiming to differentiate different severity grades and predict neurological conversion as well as motor function recovery. Issues of human samples, such as imbalanced sample size from different groups, wide age range, and male-biased sex ratios, were required to be solved before statistical analysis. Support vector machine models were built to discover potential biomarkers.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.