Usage
  • 213 views
  • 410 downloads

Nonconvex Optimization Methods for Large-scale Statistical Machine Learning

  • Author / Creator
    Zhu, Rui
  • The development of modern machine learning systems not only provides the opportunity of applications in various fields but also creates many challenges. The core of machine learning is to train a model based on a dataset, which can be posed as solving an optimization problem, usually expressed regarding carefully chosen losses and regularizers. However, those formulated optimization problems are more complicated than they appear. Many of them are not convex in model parameters, and we need advanced techniques to solve them. This thesis focuses on how to develop algorithms that can provide solutions for nonconvex optimization problems at different levels. Some nonconvex optimization problems have specific inner structure, and this allows us to develop convex approximations. In this way, we can turn to find solutions for convex approximation problems. This thesis firstly focuses on an instance of this approach for matrix and tensor estimation problem. Given a subset of observations, i.e., incomplete entries in a matrix (or a tensor), our target here is to find a matrix (or a tensor) that has the minimum rank. We use Schatten-$p$ norm ($1 \leq p \leq 2$) to approximate the matrix rank, a nonconvex function, and propose Iterative Reweighted Schatten-$p$ Norm (IS-$p$) method to solve this convex approximation. Extensive performance evaluations driven by synthesized data and real-world latency measurements show that our proposed approach achieves better and more robust performance than multiple state-of-the-art matrix completion algorithms in the presence of noise.Secondly, this thesis studies nonconvex optimization problems by exploiting the convexity of objective function from a local view. In machine learning, many objective functions have specific inner structures near global optimums, and this allows us to find efficient algorithms to solve. In the second part, we focus on an instance of this approach, matrix factorization for skewed data. Matrix factorization has been proved to be powerful tools in many fields including recommender systems and web latency estimation. However, traditional methods estimate conditional means which may be biased in skewed data settings. We propose Quantile Matrix Factorization (QMF) and Expectile Matrix Factorization (EMF) to overcome such issue. We exploit local strong convexity of global optimums for EMF and shows that we can achieve global optimum under certain conditions in EMF by alternating minimization. Finally, we study general nonconvex nonsmooth optimization problems that are prevalent in machine learning. Modern applications of machine learning have brought challenges in model complexity and data volume, and the focus of this thesis is to design distributed algorithms that allow us to solve these problems using multiple worker nodes in parallel. We propose asynchronous optimization methods that can enable worker nodes not waiting for others once they complete their tasks, which saves time. We implement the new algorithms on the parameter server system and test on large scale real data sets. We successfully demonstrate that our proposed algorithms can almost achieve linear speedup and accelerate large-scale distributed machine learning.

  • Subjects / Keywords
  • Graduation date
    Spring 2019
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-dwrb-8h16
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.