- 163 views
- 296 downloads
Data Reduction and Feature Selection for Chemometric Analysis
-
- Author / Creator
- Adutwum, Lawrence A
-
Advancements in data acquisition technologies and the desire for rich data has led to an increase in the size of data collected from modern analytical instruments. With the aid of chemometric techniques, researchers are still able to glean more useful information from these kinds of data than they can with conventional interpretation tools. These chemometric models also benefit immensely from methods that eliminate redundant information. To make these feature selection methods efficient, strategies to reduce the size of the data prior to their implementation are also desirable. However, in attempting to reduce the data volume, there is an associated risk of information loss or distortion. In chromatography, where multivariate detectors such as mass spectrometers are used, data reduction methods currently available generally resort to elimination of some dimension of the data.This dissertation presents new approaches to data size reduction for chromatographic data where multivariate detectors are used. The Unique Ion Filter (UIF) was developed as a data reduction strategy for reducing data size without altering the multivariate nature as well as the chemical information in the data. Two types of UIF were developed namely, UIF1D and UIF2D for one-dimensional and comprehensive chromatography where multivariate detectors are employed. UIF1D and UIF2D were successfully applied to complex data and were found to be very useful. Segmented total ion spectrum (STIS) was also developed to achieve data reduction with partial preservation of retention information for gas chromatography data. STIS is presented as an alignment-free data reduction method which allows inter-laboratory comparison of chromatograms so long as the same anchor compounds are used. Cluster resolution feature selection (CR-FS) was developed as an objective feature selection algorithm. Hitherto, there existed no guidance to the determination of the two main parameters needed for full automation of CR-FS. This has prevented true automation of the implementation of this algorithm. The development of an empirical approach to guide the selection of these two critical parameters is also accomplished in this dissertation. Applications of feature selection tools beyond the realm of chromatography are also explored. It is the desire of X- ray crystallographers to be able to predict the crystal structure of crystalline compounds from their elemental compositions. A machine learning approach to this problem was also explored using CR-FS to determine elemental properties that can guide such predictions. Rapid identification of micro-organism is highly desirable. This task increases in difficulty as one moves down the taxonomic rank. Feature selection with CR-FS in combination with matrix assisted laser desorption ionization mass spectroscopy (MALDI-TOFMS) data presents an opportunity for high throughput and automated method for bacterial identification. The potential of this approach is also explored.
-
- Subjects / Keywords
-
- Bacterial Strain Prediction
- Support Vector Machines
- Partial Least Squares Discriminant Analysis
- Chemometrics
- Feature Selection
- Forensics
- Principal Component Analysis
- Gas Chromatography
- Data Reduction
- Crystal Structure Prediction
- Mass Spectrometry
- Segmented Total Ion Spectrum
- Unique Ion Filter
- Classification
- Cluster Resolution
-
- Graduation date
- Fall 2017
-
- Type of Item
- Thesis
-
- Degree
- Doctor of Philosophy
-
- License
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.