ERA Banner
Download attached 1 Add to Cart Share
More Like This
  • http://hdl.handle.net/10402/era.24985
  • Estimating the Overlap of Top Instances in Lists Ranked by Correlation to Label
  • Damavandi, Babak
  • English
  • Machine Learning
    Bioinformatics
    Gene signatures
    Genome wide association studies
    GWAS
    Microarray
  • Jan 9, 2012 9:09 AM
  • Thesis
  • English
  • Adobe PDF
    Adobe PDF
  • 1213405 bytes
    1213393 bytes
  • Recent advances in high-throughput technologies, such as genome-wide SNP analysis and microar- ray gene expression profiling, have led to a multitude of ranked lists, where the features (SNPs, genes) are sorted based on their individual correlation with a phenotype. Multiple reviews have shown that most such rankings vary considerably across different studies, even in the case of sub- sampling from a single dataset. This motivates our interest in formally investigating the overlap of the top ranked features in two lists sorted by correlation with an outcome. This dissertation presents a mathematical model for better understanding lists whose entries are ranked by Pearson correlation coefficient with an outcome. We show that our model is able to accurately predict the expected overlap between two ranked lists based on reasonable assumptions. We also discuss how to generalize this model to find the overlap between other forms of rankings, provided that they satisfy mild assumptions.
  • Master's
  • Master of Science
  • Department of Computing Science
  • Spring 2012
  • Russell Greiner (Computing Science)
  • Csaba Szepesvari (Computing Science)
    Sambasivarao Damaraju (Cross Cancer Institute)