Convex Latent Modeling

  • Author / Creator
  • Most machine learning problems can be posed as solving a mathematical program that describes the structure of the prediction problem, usually expressed in terms of carefully chosen losses and regularizers. However, many machine learning problems yield mathematical programs that are not convex in model parameters, forcing the consideration of heuristic optimization strategies that do not provide guarantees of solution quality. The main focus of this thesis is to develop convex approximations of important non-convex machine learning problems; in particular, new convex formulations for deep latent modelling and robust estimation are developed. Training deep predictive models with latent hidden layers poses a hard computational problem: since the model parameters have to be trained jointly with inference over latent variables, the convexity of the training problem is usually destroyed. This thesis first proposes a novel reformulation of supervised training of a twolayer architecture by introducing a latent feature kernel, which allows a rich set of latent feature representations to be captured while still allowing useful convex formulations via semidefinite relaxation. To tackle the resulting computational problem, efficient training algorithms are developed to exploit the specific structure of the problem and overcome the inadequate scaling of general purpose semidefinite solvers. Promising empirical results are obtained that show useful hidden structure can still be captured even in the presence of convex relaxations. The thesis then shows that the two–layer approach can be extended to handle an arbitrary number of latent layers. To achieve this extension, a novel layer loss is proposed that is jointly convex in the adjacent normalized latent feature kernels. An efficient algorithmic approach is then developed for this extended formulation. Again, promising empirical results are obtained that demonstrate improved capabilities over single latent layer models. These results demonstrate the first fully convex formulation of training a deep architecture with an arbitrary number of hidden layers. A final non-convex problem this thesis addresses is robust regression in presence of outliers. Although the field of robust regression is well established, standard estimators in the robust regression literature are not convex and pose intractable computational problems, while robust estimators proposed in the machine learning literature are only robust under unrealistic assumptions. To address these shortcomings, this thesis proposes a new formulation of robust regression that admits a convex relaxation and efficient training algorithm, while still satisfying nontrivial robustness and consistency guarantees.

  • Subjects / Keywords
  • Graduation date
    2017-06:Spring 2017
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Specialization
    • Statistical Machine Learning
  • Supervisor / co-supervisor and their department(s)
    • Schuurmans, Dale (Computing Science)
    • Szepesvari, Csaba (Computing Science)
  • Examining committee members and their departments
    • Crammer, Koby (Electrical Engineering, The Technion)
    • Greiner, Russ (Computing Science)
    • Szepesvari, Csaba (Computing Science)
    • Ray, Nilanjan(Computing Science)
    • Schuurmans, Dale (Computing Science)