ERA

Download the full-sized PDF of Improved Bayesian Scoring Schemes for Protein NMR Backbone Resonance Sequential AssignmentDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3Q51G

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Computing Science, Department of

Collections

This file is in the following collections:

Technical Reports (Computing Science)

Improved Bayesian Scoring Schemes for Protein NMR Backbone Resonance Sequential Assignment Open Access

Descriptions

Author or creator
Wagner, James
Tegos, Theodore
Wan, Xiang
Lin, Guohui
Additional contributors
Subject/Keyword
NMR spectroscopy
Bayesian scoring
Type of item
Computing Science Technical Report
Computing science technical report ID
TR06-10
Language
English
Place
Time
Description
Technical report TR06-10. Background: Accurately quantifying the signature information of chemical shifts provides a foundation for accurate and complete sequential resonance assignment in protein NMR spectroscopy. A nearly complete assignment is a prerequisite for three dimensional protein structure calculation. Methods: A number of filtering steps are applied to construct two training datasets using known protein NMR data for learning scoring schemes to quantify the signature information. The scoring schemes are learned through a naive Bayesian method to use both intra-residue and inter-residue chemical shifts and to use the intermediate neural network output from the secondary structure predictor PsiPred. Results: Two training datasets ALL and HOMO for scoring scheme learning were carefully constructed. Based on these two datasets, a total of 16 scoring schemes were proposed and examined. An extensive simulation study was set up to validate these scoring schemes and the one that performed the best was implemented into a web server, which is publicly accessible. Conclusions: Through the extensive simulation study we found that the currently known protein NMR data is quite evenly distributed in terms of protein homology, and therefore homology removal in training dataset construction wouldn't gain a lot in the overall performance of the resultant scoring schemes. Also, we conclude that in general a naive Bayesian learning is better than a trivial distribution assumption. We believe this conclusion holds not just in our care but also for similar applications where the training data size is large. Another conclusion is that in applications where PsiPred prediction results are used as intermediate input, using its intermediate neural network output could be a better choice than using its the final prediction result.
Date created
2006
DOI
doi:10.7939/R3Q51G
License information
Creative Commons Attribution 3.0 Unported
Rights

Citation for previous publication

Source
Link to related item

File Details

Date Uploaded
Date Modified
2014-05-01T02:09:28.811+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 655541
Last modified: 2015:10:12 13:46:32-06:00
Filename: TR06-10.pdf
Original checksum: a72e71ccfdc007e8d1d03cde24adbdcd
Well formed: true
Valid: true
File title: Start Of Article
Page count: 30
Activity of users you follow
User Activity Date