- 72 views
- 90 downloads
Advancing ECG Analysis through Machine Learning: A Study on Data Generation for ECG Classification and Feature Selection For Individual Survival Prediction
-
- Author / Creator
- Nademi, Yousef
-
Electrocardiograms (ECGs) are a valuable and easily-collected measurement of heart health, reflecting its morphology (R peak, QRS duration,..) and rhythm(sequence of multiple heartbeats). With the advance of machine learning, many studies utilize electrocardiogram (ECG) signals for various purposes, such as detecting ECG abnormalities, predicting patient mortality and other supervised tasks. In this thesis, we used the Alberta Hospital ECG Dataset consisting of more than 1.6 million ECG collected from 244,077 patients, for two objectives: (1) To produce a generative model, which can then be used
to generate synthetic ECGs for a specific abnormality, which can augment a dataset of real ECGs, in order to improve the performance of a machine learned model for ECG abnormality classification. (2) To explore and compare different approaches for extracting high-level features from ECG signals and determine which approach is most effective in estimating patient-specific survival curves for accurately predicting time-to-death.
For the first objective, we used this ECG dataset, where each 12-lead ECG is labeled with one of 15 diagnoses abnormalities, to train an unsupervised beta variational AutoEncoder (β-VAE) model, that could generate synthetic 12-lead ECG signals time series, with each specified abnormalities. We then used
this generative model to generate ECGs with the abnormality of ST-segment Elevated (STE). These generated ECGs were then added to the public dataset from the China Physiological Signal Challenge 2018, which contained 6,877 real ECGs. This dataset included healthy controls (sinus rhythm) and 8 different abnormalities. We found that a learner trained on this extended dataset performed better than one trained on only the original data on the targeted STE label but also enhanced its performance for the classification of 4 other labels.
For the second objective, we explored ways to obtain useful high-level features from ECG traces through various approaches, including supervised with clinical diagnoses, unsupervised approaches, and knowledge-based ECG features. Using these ECG features, along with age and sex, we trained models to estimate patient-specific individual survival distributions (ISD) to predict each patient’s time-to-death. The results showed that ECG features produced by supervised learning approaches led to models that were superior in estimating patient-specific time until death than ECG features obtained from unsupervised
and knowledge-based methods. In fact, the supervised ECG features required fewer training instances (as few as 500) to learn ISD models that performed better than models that only used age and sex. On the other hand, unsupervised and knowledge-based ECG features required over 5000 training
samples to produce ISD models that performed better than ones using only age and sex. These findings may assist researchers in selecting the most appropriate approach for extracting high-level features from ECG signals to estimate patient-specific ISD curves. -
- Subjects / Keywords
-
- Graduation date
- Spring 2024
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.