High-throughput Computational Characterization and Prediction of MicroRNA Targets

  • Author / Creator
    Fan, Xiao
  • MicroRNAs (miRNAs) are short (~21 nucleotides) endogenous noncoding RNAs. They are widespread post-transcriptional regulators in eukaryotes that bind target messenger RNAs (mRNAs) and regulate the protein expression levels. MiRNAs have attracted substantial amount of research attention and consequently thanks to sequencing effort their counts continually increase over the past decade. We contributed to these efforts by designing, building, and applying a comprehensive platform for end-to-end processing of miRNA data generated by next generation sequencing. The platform, which integrates multiple computational tools, filters out known miRNAs, discovers new miRNAs, and quantifies differential expression among samples. The key element to decipher functional roles of the fast growing number of miRNAs is the high-throughput identification of miRNA targets. Computational prediction methods are widely used for this purpose. We review a comprehensive collection of 38 miRNA target predictors in animals that were developed over the last decade. Our in-depth analysis considers all significant perspectives including the underlying methodologies, ease of use, availability, impact, and evaluation protocols. We comparatively evaluate seven representative methods when predicting targets at different levels of annotations and when predicting different types of targets. As one of observations we found on average only 7% of non-canonical miRNA targets which have <7 Watson-Crick base pairs in the seed region (nucleotides 1–8 from 5’ end of the miRNA) can be identified by current miRNA target predictors. Moreover, our large scale analysis of 3’ UTR regions in several databases reveals that about half of miRNA targets are non-canonical. These targets are prevalent and hard to predict, which motivated us to develop the first custom-designed high-throughput method that accurately predicts the non-canonical targets solely from the miRNA and target sequences. Empirical tests on targets annotated with low-throughput methods, microarrays, RNA-seq and pSILAC show that our method correctly predicts 40% of non-canonical targets and more accurately finds highly repressed genes when compared to the existing methods.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.