The Budgeted Biomarker Discovery Problem

Khan, Sheehan Veikko

doi:doi:10.7939/R3QF8JQ6T

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

232 views
444 downloads

The Budgeted Biomarker Discovery Problem

Author / Creator

Khan, Sheehan Veikko
Researchers conduct association studies to discover biomarkers in order to gain new
biological insight on complex diseases and phenotypes. Although most researchers
have intuitions about what defines a biomarker and how to assess the results of an association study, there is neither a formal definition for what a biomarker is, nor objective goal for association studies. As a result, the literature is full of association studies with conflicting results – e.g., studies on the same phenotype that produce lists of biomarkers with little to no overlap.

This thesis presents the “Budgeted Biomarker Discovery (BBD) problem”, which clearly defines (1) what a biomarker is, and (2) rewards for correctly identifying
biomarkers and penalties for incorrectly identifying biomarkers. Furthermore, the
BBD problem allows researchers to use a mixture of high- and low-throughput technologies. In the context of discovering biomarkers from gene expression data, we
show how future association studies can use both microarrays and qPCR data to
objectively find the genes that are biomarkers in a cost efficient manner.

We present several algorithms for solving the BBD problem, and show that good
algorithms must make use of both microarrays and qPCR. Also, they must be able to adapt to the data as it is collected. For example, when solving a new BBD problem, we must begin by collecting microarrays because we do not yet know how many biomarkers we expect to identify, or which qPCR arrays would be most informative. Thus, we use the high-throughput microarrays to survey the problem, until we can identify which specific low-throughput qPCR arrays to use for focusing on those genes that are potentially biomarkers. To identify when this transition should occur, we present the problem of estimating the density of univariate statistics in high-throughput
data, and we present our Fused Density Estimation (FDE) algorithm as a solution. We use FDE as the backbone of our adaptive algorithms for solving BBD
problems. In a series of experiments on real microarray data and realistic synthetic
data, we show that our BBD1 algorithm is the most robust solution, amongst those
considered, to the BBD problem.
Subjects / Keywords
Graduation date

Fall 2015
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3QF8JQ6T
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Greiner, Russell (Computing Science)
Examining committee members and their departments
- Zaiane, Osmar (Computing Science)
- Wishart, David (Computing Science)
- Greiner, Russell (Computing Science)
- Baracos, Vickie (Oncology)
- Tiwari, Hemant (Biostatistics)