Gene Set Reduction for a Continuous Phenotype

  • Author / Creator
    Yasmin, Farzana
  • Introduction: Gene set analysis (GSA) examines the association between predefined gene sets and a phenotype and is becoming a topic of growing interest in DNA microarray studies. However, when a gene set is identified to be significant, often not all the genes within the gene set are responsible for the significance. Identifying core subsets improves understanding of biological mechanisms and reduces costs from diagnosis to treatment. Few methods have been introduced to isolate core genes from significant gene sets. There are no methods for continuous phenotype that eliminate redundant genes and effectively reduce the gene sets to core subsets, explaining the observed association. Objective: Our research objective is to reduce gene sets associated with a continuous phenotype to subsets of genes that chiefly contribute to the association. Methods: Our method tests subsets of a differentially expressed gene set by gradually eliminating genes not associated with the phenotype. A computationally efficient method, namely Linear Combination Test (LCT), is used to test the association between each gene set and the phenotype of interest. Within the significant gene sets we used Significance Analysis of Microarrays (SAM) to order individual gene phenotype association. Again, LCT is used to get the most differentially expressed subset of genes which is obtained by the ordered genes. Results: We studied our proposed method using a real microarray data consisting of gene expression levels of 13,233 genes measured on 33 African-American prostate cancer patients and 1403 gene sets obtained from C2 catalog of the Molecular Signature Database ( We showed results of both individual gene analysis and gene-set analysis on this data using SAM and LCT, respectively. LCT identified 30 statistical significant gene sets. We used our gene reduction method to extract core subsets of genes and calculate percent reduction in each of the 30 sets. We calculated frequencies of core genes among all the significant sets. Conclusion: This work enables us to effectively reduce the gene sets to the most important genes that contribute to disease. This approach may bring faster and more cost efficient diagnosis and treatment of chronic diseases by focusing only on differentially expressed genes in the reduced sets.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Public Health Sciences
  • Specialization
    • Epidemiology
  • Supervisor / co-supervisor and their department(s)
    • Dinu, Irina (Department of Public Health Sciences)
  • Examining committee members and their departments
    • Gombay, Edit (Department of Mathematical & Statistical Sciences)
    • Yuan, Yan (Department of Public Health Sciences)