Missing SNP Genotype Imputation

  • Author / Creator
    Wang, Yining
  • High-throughput single nucleotide polymorphism (SNP) genotyping technologies conveniently produce large SNP genotype datasets for genome-wide linkage and association studies. Various factors, from array design and hybridization, can give rise to a certain percentage of missing calls, and the problem becomes severe when the target organisms such as cattle do not have a high resolution genomic sequence available. Missing calls in SNP genotype datasets would undermine downstream data analysis. Therefore, effective methodologies for dealing with missing genotypes are in urgent need. In this dissertation, we start with a brief introduction to the concepts in genetics, then present a collection of imputation methods, with focus on machine learning algorithms, to tackle the missing SNP genotype problem. We demonstrate that these imputation approaches can achieve satisfactory accuracies, tested on the real population SNP genotype datasets, and highlight the places where our new methods find useful. We conclude with some possible future directions for the genome-wide SNP genotype imputation problem.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Lin, Guohui (Computing Science)
  • Examining committee members and their departments
    • Li, Changxi (Agricultural, Food and Nutritional Science)
    • Greiner, Russ (Computing Science)