Usage
  • 186 views
  • 384 downloads

Public Health Applications Using Big Data and Machine Learning Methods: Name- and Location-based Aboriginal Ethnicity Classification and Sentiment Analysis of Breast Cancer Screening in the United States Using Twitter

  • Author / Creator
    Wong, Kai On
  • Applications using big data and machine learning techniques are transforming how people live in the 21st century, however they are generally underutilized in public health compared to other domains. We proposed and conducted two independent studies to investigate how big data and machine learning techniques may serve important functions to address different public health challenges in North America. In Name- and Location-based Aboriginal Ethnicity Classification, we developed and tested the classification performance of a machine learning method to predict individuals’ Aboriginal status using name and location information from the 1901 Canadian census. Our automated approach has yielded good classification results, especially for a number of Aboriginal (all-inclusive) and sub-Aboriginal (such as First Nations, Algonquian, and Kootenay) statuses. The classification performance for predicting ethnicity status of these four Aboriginal groupings ranged between 0.99-1.00 in accuracy, 0.99-1.00 in ROC, 0.63-0.65 in sensitivity, 0.99-1.00 in specificity, 0.78-0.86 in PPV, and 0.99-1.00 in NPV in the validation sets. The demonstrated application illustrated that using high decision boundary values resulted in predicted First Nations-specific prevalence statistics closely approximated to the true underlying prevalence. In Sentiment Analysis of Breast Cancer Screening in the United States Using Twitter, we slightly modified the existing VADER sentiment classifier to automatically classify the sentiment of breast cancer screening-related tweets into neutral, positive, and negative. Extensive data visualization was conducted to illustrate the temporal (via time-series plot), geospatial (via point, hot spot, and quintile maps), and thematic (via word-clouds) patterns of breast cancer screening sentiment in the U.S. The ecological associations between the averaged sentiment scores and percentage of breast cancer screening uptake at the state level were examined, and significant inverse relationships (p<0.05) were found between negative sentiments and recent uptakes of mammogram and clinical breast exam.

  • Subjects / Keywords
  • Graduation date
    Fall 2017
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/R3599ZH9K
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
    English
  • Citation for previous publication
    • Wong, K. O., Davis, F. G., Zaïane, O. R., & Yasui, Y. (2016). Sentiment analysis of breast cancer screening in the United States using twitter. In KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval (Vol. 1, pp. 265-274). SciTePress.
  • Institution
    University of Alberta
  • Degree level
    Doctoral
  • Department
  • Specialization
    • Epidemiology
  • Supervisor / co-supervisor and their department(s)