Computational support systems for prediction and characterization of protein crystallization outcomes

  • Author / Creator
    Mizianty, Marcin J
  • Analysis of protein structures may reveal their function, regulation and interactions. Almost 90% of the known protein structures were solved using X-ray crystallography; however, many more structures remain unsolved. Protein Structure Initiative (PSI) project was created to speed up structure determination. PSI includes structural genomics (SG) centers that perform high-throughput crystallization which processes hundreds of proteins using standardized protocols. Large quantities of crystallization data generated by PSI fueled research that looked into proteins’ properties associated with success of crystallization. In spite of intense research crystallization of proteins is still among the most complex and least understood problems in structural biology. Since SG centers do not focus on individual proteins, but rather on covering the protein structure space, they have certain flexibility in selection of targets. At the beginning of my PhD program we designed and assessed three accurate methods that predict crystallization propensity based on a protein sequence. These methods could be used to prioritize targets based on their predicted propensity for the successful structure determination. We observed that as the crystallization protocols are updated the predictors of crystallization propensity need to be correspondingly upgraded and enhanced. To this end, in the course of the thesis we developed an accurate predictor that generates crystallization propensity and indicates causes of the potential crystallization failure, which can occur at any of the three major steps in the protein crystallization protocol: production of protein material, purification, and production of crystals. Our predictors are empirically compared against state-of-the-art in the field demonstrating favorable predictive performance. Finally, we designed another accurate and runtime-efficient method which we then used to perform first-of-its-kind large-scale analysis of crystallization propensity for proteins encoded in 1,953 fully sequenced genomes. Analysis of these predictions shows that current X-ray crystallography combined with homology modeling could provide an average per-proteome structural coverage of 73% with over 60% coverage for archaea and bacterial proteomes, and between 35 and 70% for eukaryotes. Moreover, our study revealed that use of knowledge-based target selection increases coverage by a significant margin, which for majority of organisms is between 25 to 40%.

  • Subjects / Keywords
  • Graduation date
    Fall 2013
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
  • Specialization
    • Software Engineering and Intelligent Systems
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Michalak, Marek (Biochemistry)
    • Kurgan, Lukasz (Electrical and Computer Engineering)
    • Dick, Scott (Electrical and Computer Engineering)
    • Zemp, Roger (Electrical and Computer Engineering)
    • Godzik, Adam (Sanford-Burnham Institute, La Jolla, CA)
    • Reformat, Marek (Electrical and Computer Engineering)