Download the full-sized PDF of Unique Ion Filter: A data reduction tool for GC/MS data preprocessing prior to chemometric analysisDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Chemistry, Department of


This file is in the following collections:

Journal Articles (Chemistry)

Unique Ion Filter: A data reduction tool for GC/MS data preprocessing prior to chemometric analysis Open Access


Author or creator
Adutwum, Lawrence A.
Harynuk, James
Additional contributors
Principle Component Analysis
Automotive Gasoline Samples
Gas Chromatography/Mass Spectrometry
Mass Spectrometry
Type of item
Journal Article (Published)
Using raw GC/MS data as the X-block for chemometric modeling has the potential to provide better classification models for complex samples when compared to using the total ion current (TIC), extracted ion chromatograms/profiles (EIC/EIP), or integrated peak tables. However, the abundance of raw GC/MS data necessitates some form of data reduction/feature selection to remove the variables containing primarily noise from the data set. Several algorithms for feature selection exist; however, due to the extreme number of variables (106–108 variables per chromatogram), the feature selection time can be prolonged and computationally expensive. Herein, we present a new prefilter for automated data reduction of GC/MS data prior to feature selection. This tool, termed unique ion filter (UIF), is a module that can be added after chromatographic alignment and prior to any subsequent feature selection algorithm. The UIF objectively reduces the number of irrelevant or redundant variables in raw GC/MS data, while preserving potentially relevant analytical information. In the m/z dimension, data are reduced from a full spectrum to a handful of unique ions for each chromatographic peak. In the time dimension, data are reduced to only a handful of scans around each peak apex. UIF was applied to a data set of GC/MS data for a variety of gasoline samples to be classified using partial least-squares discriminant analysis (PLS-DA) according to octane rating. It was also applied to a series of chromatograms from casework fire debris analysis to be classified on the basis of whether or not signatures of gasoline were detected. By reducing the overall population of candidate variables subjected to subsequent variable selection, the UIF reduced the total feature selection time for which a perfect classification of all validation data was achieved from 373 to 9 min (98% reduction in computing time). Additionally, the significant reduction in included variables resulted in a concomitant reduction in noise, improving overall model quality. A minimum of two um/z and scan window of three about the peak apex could provide enough information about each peak for the successful PLS-DA modeling of the data as 100% model prediction accuracy was achieved. It is also shown that the application of UIF does not alter the underlying chemical information in the data.
Date created
License information
This article has been published as "ACS AuthorChoice - This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes."
Citation for previous publication


Link to related item

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 2196368
Last modified: 2018:02:23 12:27:05-07:00
Filename: AC_86_15_7726.pdf
Original checksum: b8742d4fe0a76920f421e4e4bf75d905
Well formed: false
Valid: false
Status message: Invalid Resources Entry in document offset=463668
Status message: Unexpected error in findFonts java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary offset=5782
Status message: Invalid name tree offset=2191586
Status message: Invalid name tree offset=2191586
Status message: Invalid name tree offset=2191586
File title: AC_86_15_7726
Activity of users you follow
User Activity Date