Usage
  • 176 views
  • 332 downloads

Prediction of 1H and 13C NMR Chemical Shifts of Small Molecules Using Machine Learning

  • Author / Creator
    Sayeeda, Zinat
  • For more than 70 years, chemists have used Nuclear Magnetic Resonance (NMR) spectroscopy to characterize the atomic structure and dynamics of molecules. Key to performing the NMR analysis of almost any molecule is a process called “chemical shift assignment”. This involves matching specific peaks or chemical shifts in the NMR spectrum with specific atoms in the molecule. Using a variety of NMR techniques, chemists have performed chemical shift assignments for hundreds of thousands of organic compounds over the past few decades. However, the chemical shift assignment process can be time-consuming and difficult. It can also be fraught with errors. Because of these challenges, NMR spectroscopists have long been interested in predicting NMR chemical shifts. Having accurate methods to predict 1H (hydrogen) and 13C (carbon) NMR chemical shifts of organic molecules would greatly improve the speed and accuracy with which chemical shift assignments could be made. Over the past two decades a variety of methods, ranging from Ab initio approaches to database search methods to machine learning (ML) techniques have been applied to improve chemical shift prediction. The most promising of these are the ML methods. However, most ML methods do not achieve the level of accuracy required for consistent chemical shift assignments of small molecules, nor do they properly handle diasterotopic protons, solvent effects, pH effects, or alternate chemical shift referencing schemes. In this thesis, I will describe my efforts to develop an ML-based NMR chemical shift predictor that can accurately predict 1H and 13C NMR chemical shifts while at the same time accommodating diasterotopic protons, solvent effects, pH effects, and alternate chemical shift referencing schemes. In developing this predictor, called NMRPred, I assembled and curated a large dataset of carefully assigned and carefully referenced experimental 1H and 13C NMR assignments from 953 molecules. I also tested a variety of feature extraction and ML methods to develop two separate predictors, one for 1H and another for 13C chemical shifts. The best performing 1H predictor, which used a Random Forest Regressor, obtained a Mean Absolute Error (MAE) of 0.11 ppm with a standard deviation of 0.18 ppm on a validation set of 272 1H assignments and MAE of 0.36 ppm with a standard deviation of 0.56 ppm on a second validation set of 442 1H assignments. The best performing 13C predictor, which used a Gradient Boost Regressor, obtained an MAE of 2.94 ppm with a standard deviation of 4.2 ppm on a validation set of 1087 13C assignments and MAE of 6.65 ppm with a standard deviation of 8.65 ppm on a validation set of 653 13C assignments. On the first validation set the 1H shift predictor outperformed other chemical shift predictors in terms of its accuracy (MAE), and its ability to handle diasterotopic protons, solvent effects, pH effects, and alternate chemical shift referencing schemes. Unfortunately, the 13C shift predictor did not match the performance of the most recent and widely used 13C shift predictors. In this thesis I discuss some of the reasons why this may have happened and I present evidence that suggests that by using a larger and more varied dataset that it would be possible to improve the performance of both the 1H and 13C shift predictors.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-pk9w-p322
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.