Improved Bayesian Scoring Schemes for Protein NMR Backbone Resonance Sequential Assignment

Wagner, James; Tegos, Theodore; Wan, Xiang; Lin, Guohui

doi:doi:10.7939/R3Q51G

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Technical Reports (Computing Science)

Usage

243 views
218 downloads

Improved Bayesian Scoring Schemes for Protein NMR Backbone Resonance Sequential Assignment

Author(s) / Creator(s)
Technical report TR06-10. Background: Accurately quantifying the signature information of chemical shifts provides a foundation for accurate and complete sequential resonance assignment in protein NMR spectroscopy. A nearly complete assignment is a prerequisite for three dimensional protein structure calculation. Methods: A number of filtering steps are applied to construct two training datasets using known protein NMR data for learning scoring schemes to quantify the signature information. The scoring schemes are learned through a naive Bayesian method to use both intra-residue and inter-residue chemical shifts and to use the intermediate neural network output from the secondary structure predictor PsiPred. Results: Two training datasets ALL and HOMO for scoring scheme learning were carefully constructed. Based on these two datasets, a total of 16 scoring schemes were proposed and examined. An extensive simulation study was set up to validate these scoring schemes and the one that performed the best was implemented into a web server, which is publicly accessible. Conclusions: Through the extensive simulation study we found that the currently known protein NMR data is quite evenly distributed in terms of protein homology, and therefore homology removal in training dataset construction wouldn't gain a lot in the overall performance of the resultant scoring schemes. Also, we conclude that in general a naive Bayesian learning is better than a trivial distribution assumption. We believe this conclusion holds not just in our care but also for similar applications where the training data size is large. Another conclusion is that in applications where PsiPred prediction results are used as intermediate input, using its intermediate neural network output could be a better choice than using its the final prediction result. | TRID-ID TR06-10
Date created

2006
Subjects / Keywords
- NMR spectroscopy
- Bayesian scoring
Type of Item

Report
DOI

https://doi.org/10.7939/R3Q51G
License

Attribution 3.0 International

Language
- English