Improved Bayesian Scoring Schemes for Protein NMR Backbone Resonance Sequential Assignment

  • Author(s) / Creator(s)
  • Technical report TR06-10. Background: Accurately quantifying the signature information of chemical shifts provides a foundation for accurate and complete sequential resonance assignment in protein NMR spectroscopy. A nearly complete assignment is a prerequisite for three dimensional protein structure calculation. Methods: A number of filtering steps are applied to construct two training datasets using known protein NMR data for learning scoring schemes to quantify the signature information. The scoring schemes are learned through a naive Bayesian method to use both intra-residue and inter-residue chemical shifts and to use the intermediate neural network output from the secondary structure predictor PsiPred. Results: Two training datasets ALL and HOMO for scoring scheme learning were carefully constructed. Based on these two datasets, a total of 16 scoring schemes were proposed and examined. An extensive simulation study was set up to validate these scoring schemes and the one that performed the best was implemented into a web server, which is publicly accessible. Conclusions: Through the extensive simulation study we found that the currently known protein NMR data is quite evenly distributed in terms of protein homology, and therefore homology removal in training dataset construction wouldn't gain a lot in the overall performance of the resultant scoring schemes. Also, we conclude that in general a naive Bayesian learning is better than a trivial distribution assumption. We believe this conclusion holds not just in our care but also for similar applications where the training data size is large. Another conclusion is that in applications where PsiPred prediction results are used as intermediate input, using its intermediate neural network output could be a better choice than using its the final prediction result. | TRID-ID TR06-10

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International