Usage
  • 189 views
  • 274 downloads

Analyzing Biomarker Discovery: Estimating the Reproducibility of Biomarkers

  • Author / Creator
    Forouzandehmoghadam, Amirhosein
  • A biomarker is a feature (e.g., gene expression, SNP, etc.) that is significantly different between two classes of instances – typically case and control. Knowing these biomarkers can help us understand a biological condition or identify the appropriate treatment for a certain disease. Many researchers try to identify these biomarkers by using univariate hypothesis testing over a labeled dataset – selecting a feature if it is statistically significantly different. However, such sets of proposed biomarkers are often not reproducible – subsequent studies typically fail to identify the same sets; indeed, there is often a very small overlap between the biomarkers proposed in various pairs of related studies, exploring the same phenotypes over the same distribution of subjects.This dissertation first defines the Reproducibility Score for a labeled dataset, as a measure (in [0,1]) of reproducibility of the results produced by the specified biomarker discovery process, for this distribution of subjects. We then provide ways to reliably estimate this score – giving ways to produce an over-bound, an under-bound and a middle-value approximation for this score for a given dataset. These specific tools apply to the univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results for many datasets (microarray, RNAseq and SNP), and show that these predictions match known reproducibility results. Finally, we explore how changing some of the settings of a biomarker discovery process (such as p-value threshold, p-value correction method, sample size, etc.) can affect the results and the Reproducibility Score using real datasets.

  • Subjects / Keywords
  • Graduation date
    Spring 2019
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-64zw-q909
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.