A disease classifier for metabolic profiles based on metabolic pathway knowledge

  • Author / Creator
    Eastman, Thomas
  • This thesis presents Pathway Informed Analysis (PIA), a classification method for predicting disease states (diagnosis) from metabolic profile measurements that incorporates biological knowledge in the form of metabolic pathways. A metabolic pathway describes a set of chemical reactions that perform a specific biological function. A significant amount of biological knowledge produced by efforts to identify and understand these pathways is formalized in readily accessible databases such as the Kyoto Encyclopedia of Genes and Genomes. PIA uses metabolic pathways to identify relationships among the metabolite concentrations that are measured by a metabolic profile. Specifically, PIA assumes that the class-conditional metabolite concentrations (diseased vs. healthy, respectively) follow multivariate normal distributions. It further assumes that conditional independence statements about these distributions derived from the pathways relate the concentrations of the metabolites to each other. The two assumptions allow for a natural representation of the class-conditional distributions using a type of probabilistic graphical model called a Gaussian Markov Random Field. PIA efficiently estimates the parameters defining these distributions from example patients to produce a classifier. It classifies an undiagnosed patient by evaluating both models to determine the most probable class given their metabolic profile.

    We apply PIA to a data set of cancer patients to diagnose those with a muscle wasting disease called cachexia. Standard machine learning algorithms such as Naive Bayes, Tree-augmented Naive Bayes, Support Vector Machines and C4.5 are used to evaluate the performance of PIA. The overall classification accuracy of PIA is better than these algorithms on this data set but the difference is not statistically significant. We also apply PIA to several other classification tasks. Some involve predicting various manipulations of the metabolic processes performed in experiments with worms. Other tasks are to classify pigs according to properties of their dietary intake. The accuracy of PIA at these tasks is not significantly better than the standard algorithms.

  • Subjects / Keywords
  • Graduation date
    Spring 2010
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.