Download the full-sized PDF of Learning to predict the sites of metabolism and metabolic endpointsDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Learning to predict the sites of metabolism and metabolic endpoints Open Access


Other title
Metabolism prediction
Site of metabolism
Preference learning
Type of item
Degree grantor
University of Alberta
Author or creator
Shi, Zheng
Supervisor and department
Russ, Greiner (Computing Science)
Examining committee member and department
David Wishart (Computing Science)
Guohui Lin (Computing Science)
Department of Computing Science

Date accepted
Graduation date
Master of Science
Degree level
When you ingest anything (e.g., food or medicine), your body will break down (metabolize) the compound's molecules; this process clearly affects the safety and the effectiveness of the compound. This breakdown is facilitated by certain proteins that catalyze this process. Thus it is important to predict whether a compound will be catalyzed by a particular protein, how it will be metabolized and what compounds will result from the process. This thesis presents the framework and models for software systems dealing with three subtasks. The substrate predictor will learn to predict whether a given molecule will be catalyzed by a specific enzyme. Here we focus on the cytochrome P450 (CYP) proteins, which catalyze 90% of the drugs currently on the market. Each catalysis process involves at least one "site of metabolism" (SOM), which is the location of a single atom within the compound, where the reaction happens. We learned one SOM predictor for each of the 9 enzymes, that predicts which site(s) of the compound will be modified This SOM predictor involves a novel "ranking and classification" framework, and works with simple-to-compute features. Finally, we present a simple way to generate the metabolic endpoints, given the enzyme and predicted SOMs. The empirical results on small datasets show our overall system, including substrate predictor and SOM predictor, performs quite well and is superior to state-of-art systems, in terms of computational efficiency and/or accuracy.
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 885969
Last modified: 2016:06:16 17:03:58-06:00
Filename: Shi_Zheng_201601_Master.pdf
Original checksum: df79a25a9107399b0f5475eb707b768a
Well formed: true
Valid: true
Page count: 52
Activity of users you follow
User Activity Date