Data Reduction and Feature Selection for Chemometric Analysis

  • Author / Creator
    Adutwum, Lawrence A
  • Advancements in data acquisition technologies and the desire for rich data has led to an increase in the size of data collected from modern analytical instruments. With the aid of chemometric techniques, researchers are still able to glean more useful information from these kinds of data than they can with conventional interpretation tools. These chemometric models also benefit immensely from methods that eliminate redundant information. To make these feature selection methods efficient, strategies to reduce the size of the data prior to their implementation are also desirable. However, in attempting to reduce the data volume, there is an associated risk of information loss or distortion. In chromatography, where multivariate detectors such as mass spectrometers are used, data reduction methods currently available generally resort to elimination of some dimension of the data.This dissertation presents new approaches to data size reduction for chromatographic data where multivariate detectors are used. The Unique Ion Filter (UIF) was developed as a data reduction strategy for reducing data size without altering the multivariate nature as well as the chemical information in the data. Two types of UIF were developed namely, UIF1D and UIF2D for one-dimensional and comprehensive chromatography where multivariate detectors are employed. UIF1D and UIF2D were successfully applied to complex data and were found to be very useful. Segmented total ion spectrum (STIS) was also developed to achieve data reduction with partial preservation of retention information for gas chromatography data. STIS is presented as an alignment-free data reduction method which allows inter-laboratory comparison of chromatograms so long as the same anchor compounds are used. Cluster resolution feature selection (CR-FS) was developed as an objective feature selection algorithm. Hitherto, there existed no guidance to the determination of the two main parameters needed for full automation of CR-FS. This has prevented true automation of the implementation of this algorithm. The development of an empirical approach to guide the selection of these two critical parameters is also accomplished in this dissertation. Applications of feature selection tools beyond the realm of chromatography are also explored. It is the desire of X- ray crystallographers to be able to predict the crystal structure of crystalline compounds from their elemental compositions. A machine learning approach to this problem was also explored using CR-FS to determine elemental properties that can guide such predictions. Rapid identification of micro-organism is highly desirable. This task increases in difficulty as one moves down the taxonomic rank. Feature selection with CR-FS in combination with matrix assisted laser desorption ionization mass spectroscopy (MALDI-TOFMS) data presents an opportunity for high throughput and automated method for bacterial identification. The potential of this approach is also explored.

  • Subjects / Keywords
  • Graduation date
    Fall 2017
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Citation for previous publication
    • Oliynyk, A. O., Adutwum, L. A., Harynuk, J. J., & Mar, A., Classifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis. Chemistry of Materials 28.18 (2016): 6672-6681.
    • Adutwum, L. A., and J. J. Harynuk., Unique ion filter: a data reduction tool for GC/MS data preprocessing prior to chemometric analysis. Analytical chemistry 86.15 (2014): 7726-7733.
  • Institution
    University of Alberta
  • Degree level
  • Department
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Bouchard, Vincent (Mathematical and Statistical Sciences)
    • Rutan, Sarah (Chemistry, Virginia Commonwealth University)
    • Lucy, Charles (Chemistry)
    • Le, Chris X. C. (Chemistry)