Download the full-sized PDF of Gene Set Reduction for a Continuous PhenotypeDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Gene Set Reduction for a Continuous Phenotype Open Access


Other title
Significance analysis of microarrays
Linear combination test
Microarray study
Gene set reduction
Gene set analysis
Type of item
Degree grantor
University of Alberta
Author or creator
Yasmin, Farzana
Supervisor and department
Dinu, Irina (Department of Public Health Sciences)
Examining committee member and department
Gombay, Edit (Department of Mathematical & Statistical Sciences)
Yuan, Yan (Department of Public Health Sciences)
Department of Public Health Sciences
Date accepted
Graduation date
Master of Science
Degree level
Introduction: Gene set analysis (GSA) examines the association between predefined gene sets and a phenotype and is becoming a topic of growing interest in DNA microarray studies. However, when a gene set is identified to be significant, often not all the genes within the gene set are responsible for the significance. Identifying core subsets improves understanding of biological mechanisms and reduces costs from diagnosis to treatment. Few methods have been introduced to isolate core genes from significant gene sets. There are no methods for continuous phenotype that eliminate redundant genes and effectively reduce the gene sets to core subsets, explaining the observed association. Objective: Our research objective is to reduce gene sets associated with a continuous phenotype to subsets of genes that chiefly contribute to the association. Methods: Our method tests subsets of a differentially expressed gene set by gradually eliminating genes not associated with the phenotype. A computationally efficient method, namely Linear Combination Test (LCT), is used to test the association between each gene set and the phenotype of interest. Within the significant gene sets we used Significance Analysis of Microarrays (SAM) to order individual gene phenotype association. Again, LCT is used to get the most differentially expressed subset of genes which is obtained by the ordered genes. Results: We studied our proposed method using a real microarray data consisting of gene expression levels of 13,233 genes measured on 33 African-American prostate cancer patients and 1403 gene sets obtained from C2 catalog of the Molecular Signature Database (
). We showed results of both individual gene analysis and gene-set analysis on this data using SAM and LCT, respectively. LCT identified 30 statistical significant gene sets. We used our gene reduction method to extract core subsets of genes and calculate percent reduction in each of the 30 sets. We calculated frequencies of core genes among all the significant sets. Conclusion: This work enables us to effectively reduce the gene sets to the most important genes that contribute to disease. This approach may bring faster and more cost efficient diagnosis and treatment of chronic diseases by focusing only on differentially expressed genes in the reduced sets.

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 699481
Last modified: 2015:10:12 17:10:09-06:00
Filename: Yasmin_Farzana_Spring 2014.pdf
Original checksum: 12fcdd47e9f116bcd01479f0e220da1f
Well formed: false
Valid: false
Status message: No document catalog dictionary offset=0
Activity of users you follow
User Activity Date