- 153 views
- 190 downloads
Ensemble Based Ultrahigh Dimensional Variable Screening
-
- Author / Creator
- Dong Yang
-
With the development of modern technology, researchers in various fields are equipped with powerful tools to collect ultrahigh dimensional data, where the number of features p could grow exponentially with the sample size n. It is challenging to extract useful information due to the huge number of features. To tackle this challenge, Fan and Lv [14] proposed the two-scale approach where variable screening procedure is applied first instead of traditional one-scale
variable selection. The purpose of variable screening is to eliminate as many noisy features as possible while keep all the important features. There are many variable screening methods that work well with various assumptions. However, most of them are not stable in a sense that a small perturbation in the sample may result in very different selected features. On the other hand, it is difficult to verify all the assumptions in reality. Therefore, a generic guideline is desired to select appropriate screening methods that fit different applications. A natural choice is to combine multiple screening methods to
adapt more general assumptions. In this thesis, we propose a group of ensemble methods to aggregate results from multiple screening methods. Our methods are capable of providing stable results and work well even if some of the candidate screening methods fail. In particular, we propose three ensemble approaches to encourage stability, namely, parallel ensemble screening, quantile ensemble screening and multi ensemble screening. We show that each of the proposed procedure has the sure screening property, which means the
selected set contains the true active variables with a probability tending to one provided each of the method combined shows sure screening property. We validate our methods through both simulation studies and real data analysis. -
- Subjects / Keywords
-
- Graduation date
- Fall 2018
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.