High-dimensional Generalized Robust Regression and Outlier Detection

  • Author / Creator
    Wang, Yibo
  • A great deal of statistical research has been done in high- and ultrahigh-dimensional settings in recent years. Regularized approaches have been extensively used in dealing with high-dimensional datasets. It is widely acknowledged that robust procedures are important to deal with the influence of outliers in high- and ultrahigh-dimensional regression problems. The methods based on the least squares regression produce satisfactory performance only when data have symmetric and light-tailed distributions. Quantile regression and least absolute deviation regression methods have been widely used to address data with heavy-tailed errors. However, quantile regression and least absolute deviation regression are less efficient. To this end, in this thesis, we aim to solve two problems: (i) Estimating the regression vector when both outliers and leverage points are present; (ii) Identifying the locations of outliers when the observations are contaminated and performing robust parameter estimation. To handle the first problem, we propose two different procedures: the generalized adaptive robust regression (GAR) and lq robust regression. To achieve this goal, a two-step procedure with adaptive weights in the l1-penalty function is developed. We exhibit that both GAR regression and lq robust regression estimators possess the oracle properties. To address the second problem, we develop a new procedure that can perform outlier detection and robust estimation simultaneously. We demonstrate that the new methodology
    under the multivariate regression model enjoys robust estimation. Extensive simulation results and real data examples are used to illustrate that the proposed new methods can handle the situation where outliers occur in the response and covariates with success.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.