Usage
  • 109 views
  • 443 downloads

Decomposition and Feature Selection of Comprehensive 2-DimensionalGas Chromatography - Time-of-Flight Mass Spectrometry(GC×GC-TOFMS) Data

  • Author / Creator
    Armstrong, Michael D S
  • Comprehensive Two-Dimensional Gas Chromatography - Time-of-Flight Mass Spectrometry
    (GC×GC-TOFMS) is an advanced instrumental technique that separates complex mixtures along two chromatographic dimensions, followed by multivariate detection that collects mass spectral information at a high acquisition rate. GC×GCTOFMS improves upon the sensitivity and selectivity of traditional Gas Chromatography - Mass Spectrometry (GC-MS), and as such many more chemicals can be
    identified and quantified within a much shorter span of time.
    Current commercial offerings, and some academic works have largely focused on capitalising upon the sensitivity and selectivity of GC×GC-TOFMS in order to find more chemical components per chromatogram, often achieved by removing interfering noise from the signal and digging far into the Signal-to-Noise Ratio (SNR). For experiments
    where it is necessary to correlate some observable characteristic of the samples being analysed with the chemical information available in the GC×GC-TOFMS chromatograms,
    this usually creates far more features than samples. This is a common problem in the practice of chemometrics, and there are a number of feature selection routines and rank-deficient solutions to the inverse least squares problem that can
    correct for this inequality of variables to samples. However, a problem arises when these features are poorly integrated and/or associated across multiple samples. This has been a persistent and known problem within the chromatography community for years, and while it remains an active area of research, little has been done to develop an algorithm to properly quantify and identify these chemical components without excessive programmatic steps that are prone to failure.
    The main issue surrounding this problem is the fact that chemical components often drift between runs along both their first- and second-dimension retention modes.
    Although chemometricians have been using Parallel Factor Analysis 2 (PARAFAC2) to model chromatographic drift along one mode for decades, thus far, no algorithm has been developed to handle drift in two modes using a similarly mathematically satisfying way.
    In this work, I present improvements to the Feature Selection by Cluster Resolution (FS-CR) algorithm that enables high quality information to be extracted from peak tables with a number of integration artefacts such that many more combinations of data can be analysed in a much shorter span of time; generally improving upon the feature selection routine. This algorithm was tested upon a number of
    datasets, most of which were created during the course of this research. Following this, a parsimonious solution for the analysis of GC×GC-TOFMS data with drift in two modes will be proposed, named PARAFAC2×2. Within a particular region
    of the chromatogram, this algorithm appears capable of deconvolving components with drift that varies across each sample independently, under close to the worst conditions possible. To the end of creating a parameter-free pre-processing routine for entire chromatograms, a novel method for predicting the chemical rank of a matrix will be proposed. This may enable automated, parameter-free processing of raw
    GC×GC-TOFMS data sometime in the near future.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-q9sf-s571
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.