ERA

Download the full-sized PDF of Clustering Survival Data using Random Forest and Persistent HomologyDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R34Q7R14G

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Clustering Survival Data using Random Forest and Persistent Homology Open Access

Descriptions

Other title
Subject/Keyword
Clustering
Survival
Data
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
Wubie, Berhanu A.
Supervisor and department
Giseon Heo (Medicine and Dentistry)
Examining committee member and department
Linglong Kong (Statistics)
Bei Jiang (Statistics)
Russ Greiner (Computing Science)
Department
Department of Mathematical and Statistical Sciences
Specialization
Biostatistics
Date accepted
2016-09-30T10:23:03Z
Graduation date
2016-06:Fall 2016
Degree
Master of Science
Degree level
Master's
Abstract
Survival data is mostly analyzed using Cox proportional hazards model to identify factors associated with survival time of patients. However recently random survival forest (RSF), a non-parametric method for ensemble estimation constructed by bagging of classification trees for survival data, is used as an alternative method for better survival prediction and ranking the importance of covariates associated with it. In addition to identification of variable importance for survival prediction, exploring clusters in survival data using the variables identified as important in RSF analysis were applied. Clustering survival data (patients) to assess their survival experience was investigated using random forest clustering based on partitioning around the medoids and persistent homology (PH), a topological data analysis (TDA) technique for cluster identification in lower dimension (dimension zero). In both methods, we were able to identify different groups of patients possessing different survival experience accounting for those covariates most important in determining survival experience. The clusters formed were assessed for significant difference in their survival experience (log-rank test) and were found to have difference in survival experience between them. Further investigation was applied using PH to explore more detailed characteristic features of patients at higher dimension (dimension one). Both clustering methods result in a promising exploration of groups within patients that will give insight into to patient handling and give valuable information in providing quality service to patients who need more attention. All analysis procedures in this thesis were done using two datasets: the kidney and liver dataset.
Language
English
DOI
doi:10.7939/R34Q7R14G
Rights
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication

File Details

Date Uploaded
Date Modified
2016-09-30T21:23:26.054+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 3345810
Last modified: 2016:11:16 14:30:52-07:00
Filename: Wubie_Berhanu_A_201609_MSc.pdf
Original checksum: 9a8fb7ed95f66874f6a4354644c2c328
Well formed: true
Valid: true
File title: Berhanu's thesis.pdf
File author: Berhanu Wubie
Page count: 110
Activity of users you follow
User Activity Date