ERA Banner
Download Add to Cart Share
More Like This
  • http://hdl.handle.net/10402/era.28043
  • Developing and Evaluating Methods for Mitigating Sample Selection Bias in Machine Learning
  • Pelayo Ramirez, Lourdes
  • English
  • Machine Learning
    Software Reliability
    Stratification
    Learning in Imbalanced Datasets
    Sample Selection Bias
    Software Defect Prediction
  • Sep 30, 2011 5:16 PM
  • Thesis
  • English
  • Adobe PDF
  • 6239882 bytes
  • The imbalanced learning problem occurs in a large number of economic and health domains of great importance; consequently, it has drawn a significant amount of interest from academia, industry, and government funding agencies. Several researchers have used stratification to alleviate this problem; however, it is not clear what stratification strategy is in general more effective: under-sampling, over-sampling or the combination of both. Our first topic evaluates the contribution of stratification strategies in the software defect prediction area. We study the statistical contribution of stratification in the new Mozilla dataset, a new large-scale software defect prediction dataset which includes both object-oriented metrics and a count of defects per module. Our second topic responds to the debate about the contribution of over-sampling, under-sampling and the combination of both with the employment of a full-factorial design experiment using the Analysis of Variance (ANOVA) over six software defect prediction datasets. We extend our research to develop a stratification method to mitigate sample selection bias in function approximation problems. The sample selection bias is present when the training and test instances are drawn from a different distribution, with the imbalance dataset problem considered a particular case of sample selection bias. We extend the well-known SMOTE over-sampling technique to continuous-valued response variables. Our new algorithm proves to be a valuable algorithm helping to increase the performance on function approximation problems and effectively reducing the impact of sample selection bias.
  • Doctoral
  • Doctor of Philosophy
  • Department of Electrical and Computer Engineering
  • Fall 2011
  • Dick, Scott (Electrical and Computer Engineering)
  • Pedrycz, Witold (Electrical and Computer Engineering)
    Reformat, Marek (Electrical and Computer Engineering)
    Sutton, Richard (Computer Science)
    Denzinger, Joerg (Computer Science University of Calgary)