The Budgeted Biomarker Discovery Problem

  • Author / Creator
    Khan, Sheehan Veikko
  • Researchers conduct association studies to discover biomarkers in order to gain new biological insight on complex diseases and phenotypes. Although most researchers have intuitions about what defines a biomarker and how to assess the results of an association study, there is neither a formal definition for what a biomarker is, nor objective goal for association studies. As a result, the literature is full of association studies with conflicting results – e.g., studies on the same phenotype that produce lists of biomarkers with little to no overlap. This thesis presents the “Budgeted Biomarker Discovery (BBD) problem”, which clearly defines (1) what a biomarker is, and (2) rewards for correctly identifying biomarkers and penalties for incorrectly identifying biomarkers. Furthermore, the BBD problem allows researchers to use a mixture of high- and low-throughput technologies. In the context of discovering biomarkers from gene expression data, we show how future association studies can use both microarrays and qPCR data to objectively find the genes that are biomarkers in a cost efficient manner. We present several algorithms for solving the BBD problem, and show that good algorithms must make use of both microarrays and qPCR. Also, they must be able to adapt to the data as it is collected. For example, when solving a new BBD problem, we must begin by collecting microarrays because we do not yet know how many biomarkers we expect to identify, or which qPCR arrays would be most informative. Thus, we use the high-throughput microarrays to survey the problem, until we can identify which specific low-throughput qPCR arrays to use for focusing on those genes that are potentially biomarkers. To identify when this transition should occur, we present the problem of estimating the density of univariate statistics in high-throughput data, and we present our Fused Density Estimation (FDE) algorithm as a solution. We use FDE as the backbone of our adaptive algorithms for solving BBD problems. In a series of experiments on real microarray data and realistic synthetic data, we show that our BBD1 algorithm is the most robust solution, amongst those considered, to the BBD problem.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Greiner, Russell (Computing Science)
  • Examining committee members and their departments
    • Zaiane, Osmar (Computing Science)
    • Wishart, David (Computing Science)
    • Greiner, Russell (Computing Science)
    • Baracos, Vickie (Oncology)
    • Tiwari, Hemant (Biostatistics)