The challenge of applying machine learning techniques to diagnose schizophrenia using multi-site fMRI data

Vega Romero, Roberto I

doi:doi:10.7939/R3W08WV1D

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

639 views
589 downloads

The challenge of applying machine learning techniques to diagnose schizophrenia using multi-site fMRI data

Author / Creator

Vega Romero, Roberto I
One of the main challenges for the use of machine learning techniques in neuroimaging data is the small n, large p problem. Datasets usually contain only a few hundreds of instances (n), each of which is described using hundreds of thousands of features (p). In this dissertation, we explore the effects of reducing the number of features by analyzing 264 specific regions of interest of the brain, and increasing the number of instances by merging imaging data obtained from different scanning sites for distinguishing people with schizophrenia from healthy controls. Empirical results show that, using features related to functional connectivity of the brain, we can achieve an accuracy above the chance level (over 70 %), when using data from a single scanning site for both training and testing. However, this performance decreases when additional data from a different scanning site is used as part of the training process. We attribute the decrease in performance to batch effects: technical noise introduced at different scanning sites that confound the biological signal of interest. Batch effects are often disregarded in association studies because there is often no statistically significant interaction between the scanning site and the variables being analyzed. In this work, we highlight important differences between association studies and prediction studies, and we argue that in the latter, batch effects matter. Our experiments reveal that not taking them into account reduces the performance of a learned classifier compared to using data from a single scanning site, even though this drastically reduces the size of the training set. In addition, we can create a classifier that can distinguish among sites (not case vs control) with an accuracy > 80 %. We empirically show that if the same subjects are scanned in two different sites, then a neural network that maps the fMRI scan from one scanner into another is enough for correcting the batch effects. In more realistic situations, involving disjoint set of subjects, simple techniques like z-score normalization or whitening can remove batch effects caused by translations and scaling, or translations and rotations of the feature matrix. Both approaches proved successful in reducing the accuracy of scanning site classification to near chance level, but they were unable to improve the accuracy of schizophrenia diagnosis using multisite data. This is a strong indication that batch effects go beyond these simple linear transformations. Finally, we explored the use of BECCA (batch effects correction using canonical correlation analysis) and approaches based on autoencoders for decreasing the influence of batch effects. These attempts were also unsuccessful under our test scenarios, suggesting that batch effects is a serious problem in prediction studies using fMRI data, and that more effort should be taken to understand their nature in order to reduce their influence.
Subjects / Keywords
Graduation date

Spring 2017
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3W08WV1D
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Brown, Matthew (Psychiatry)
- Greiner, Russell (Computing Science)
Examining committee members and their departments
- Schuurmans, Dale (Computing Science)
- Pierre Boulanger (Computing Science)
- Greiner, Russell (Computing Science)
- Brown, Matthew (Psychiatry)