Gene Set Reduction for a Continuous Phenotype

Yasmin, Farzana

doi:doi:10.7939/R38912014

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

262 views
155 downloads

Gene Set Reduction for a Continuous Phenotype

Author / Creator

Yasmin, Farzana
Introduction: Gene set analysis (GSA) examines the association between predefined gene sets and a phenotype and is becoming a topic of growing interest in DNA microarray studies. However, when a gene set is identified to be significant, often not all the genes within the gene set are responsible for the significance. Identifying core subsets improves understanding of biological mechanisms and reduces costs from diagnosis to treatment. Few methods have been introduced to isolate core genes from significant gene sets. There are no methods for continuous phenotype that eliminate redundant genes and effectively reduce the gene sets to core subsets, explaining the observed association.

Objective: Our research objective is to reduce gene sets associated with a continuous phenotype to subsets of genes that chiefly contribute to the association.

Methods: Our method tests subsets of a differentially expressed gene set by gradually eliminating genes not associated with the phenotype. A computationally efficient method, namely Linear Combination Test (LCT), is used to test the association between each gene set and the phenotype of interest. Within the significant gene sets we used Significance Analysis of Microarrays (SAM) to order individual gene phenotype association. Again, LCT is used to get the most differentially expressed subset of genes which is obtained by the ordered genes.

Results: We studied our proposed method using a real microarray data consisting of gene expression levels of 13,233 genes measured on 33 African-American prostate cancer patients and 1403 gene sets obtained from C2 catalog of the Molecular Signature Database (http://www.broadinstitute.org/gsea/msigdb). We showed results of both individual gene analysis and gene-set analysis on this data using SAM and LCT, respectively. LCT identified 30 statistical significant gene sets. We used our gene reduction method to extract core subsets of genes and calculate percent reduction in each of the 30 sets. We calculated frequencies of core genes among all the significant sets.

Conclusion: This work enables us to effectively reduce the gene sets to the most important genes that contribute to disease. This approach may bring faster and more cost efficient diagnosis and treatment of chronic diseases by focusing only on differentially expressed genes in the reduced sets.
Subjects / Keywords
Graduation date

Spring 2014
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R38912014
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Public Health Sciences
Specialization
- Epidemiology
Supervisor / co-supervisor and their department(s)
- Dinu, Irina (Department of Public Health Sciences)
Examining committee members and their departments
- Gombay, Edit (Department of Mathematical & Statistical Sciences)
- Yuan, Yan (Department of Public Health Sciences)