Estimating the Overlap of Top Instances in Lists Ranked by Correlation to Label

  • Author / Creator
    Damavandi, Babak
  • Recent advances in high-throughput technologies, such as genome-wide SNP analysis and microar- ray gene expression profiling, have led to a multitude of ranked lists, where the features (SNPs, genes) are sorted based on their individual correlation with a phenotype. Multiple reviews have shown that most such rankings vary considerably across different studies, even in the case of sub- sampling from a single dataset. This motivates our interest in formally investigating the overlap of the top ranked features in two lists sorted by correlation with an outcome. This dissertation presents a mathematical model for better understanding lists whose entries are ranked by Pearson correlation coefficient with an outcome. We show that our model is able to accurately predict the expected overlap between two ranked lists based on reasonable assumptions. We also discuss how to generalize this model to find the overlap between other forms of rankings, provided that they satisfy mild assumptions.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Russell Greiner (Computing Science)
  • Examining committee members and their departments
    • Csaba Szepesvari (Computing Science)
    • Sambasivarao Damaraju (Cross Cancer Institute)