Download the fullsized PDF
Share
Permanent link (DOI): https://doi.org/10.7939/R3H418
Communities
This file is in the following communities:
Graduate Studies and Research, Faculty of 
Collections
This file is in the following collections:
Theses and Dissertations 
Online Learning for Linearly Parametrized Control Problems Open Access
Descriptions
 Other title
 Subject/Keyword

Online Learning
Reinforcement Learning
Confidence Sets
Linear Bandits
 Type of item
 Thesis
 Degree grantor

University of Alberta
 Author or creator

AbbasiYadkori, Yasin
 Supervisor and department

Szepesvari, Csaba (Computing Science)
 Examining committee member and department

Borkar, Vivek S. (School of Technology and Computer Science)
Huang, Biao (Chemical and Materials Engineering)
Sutton, Richard (Computing Science)
Greiner, Russell (Computing Science)
Schuurmans, Dale (Computing Science)
 Department

Department of Computing Science
 Specialization

Statistical Machine Learning
 Date accepted

20120928T14:04:18Z
 Graduation date

201306
 Degree

Doctor of Philosophy
 Degree level

Doctoral
 Abstract

In a discretetime online control problem, a learner makes an effort to control the state of an initially unknown environment so as to minimize the sum of the losses he suffers, where the losses are assumed to depend on the individual statetransitions. Various models of control problems have been topic of intensive research for more than 50 years. Despite this, theoretical results for the online variant of the problem remain limited. In this thesis, we study several online control problems, ranging from the simplest stateless problems, also known as bandits, through classical control problems where the system dynamics is linear and the cost is quadratic in the state and the control, to more complex nonlinear problems. The main topic is the design of algorithms for these problems and the development of finitetime performance guarantees for the algorithms proposed. A common theme of the problems is that they assume a linear parametric uncertainty. Accordingly, our methods employ a linearintheparameters predictor and construct a confidence set that contains the true unknown parameter with sufficiently high probability. In particular, following the "optimism in the face of uncertainty" principle, the algorithms always use the parameter that gives rise to the lowest expected loss. In this framework, a tighter confidence set immediately results in a better performing online learning method. The first main contribution of the thesis is the construction of smaller confidence sets for the leastsquares estimate. To arrive at these confidence sets, we derive a novel tail inequality for vectorvalued martingales. Based on this new confidence set, we modify and, consequently, improve the analysis of the algorithm for the linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), and Rusmevichientong and Tsitsiklis (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. The second main contribution is the introduction of a novel technique to construct confidence sets, which we call onlinetoconfidenceset conversion. The technique allows us to construct highprobability confidence sets for linear prediction with correlated inputs given the predictions of any algorithm (e.g., online LASSO, exponentiated gradient algorithm (Kivinen and Warmuth, 1997), online leastsquares (Lai et al., 1979, Auer et al., 2002b, Vovk, 2001), pnorm algorithms (Grove et al., 2001, Gentile and Littlestone, 1999)); in general, any algorithm whose objective is to achieve low regret with respect to the quadratic loss while using linear predictors. By construction, the size of the confidence set is directly governed by the regret of the online learning algorithm. As a result of this "reductionist" approach, progress in constructing better algorithms for online prediction problems directly translates into tighter confidence sets. As a demonstration of this new approach to constructing confidence sets, we introduce the sparse variant of linear stochastic bandits and show that a recent online algorithm together with our onlinetoconfidenceset conversion allows one to derive algorithms that can exploit if the unknown parameter vector determining the expected loss of actions is sparse. In the second part of the thesis, we study the average loss linear quadratic (LQ) control problem with unknown model parameters, also known as the LQ adaptive control problem in the control community. We design an algorithm and prove that its regret up to time T, apart from logarithmic factors, is O(sqrt{T}). To the best of our knowledge, this is the first time that an algorithm designed for this problem is shown to enjoy a sublinear finitetime regret bound. We also show that similar techniques can be employed to design and analyze an algorithm for a more general problem with nonlinear dynamics but linear parametric uncertainty. To the best of our knowledge this is the the first time that regret bounds are derived for these classes of control problems.
 Language

English
 DOI

doi:10.7939/R3H418
 Rights
 Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
 Citation for previous publication

Yasin AbbasiYadkori and Csaba Szepesvari. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, 2011.Yasin AbbasiYadkori, David Pal, and Csaba Szepesvari. Improved algorithms for linear stochastic bandits. In Proceedings of the Twenty Fifth Annual Conference on Neural Information Processing Systems, 2011.Yasin AbbasiYadkori, David Pal, and Csaba Szepesvari. Onlinetoconfidenceset conversions and application to sparse stochastic bandits. In Proceedings of the Fourteenth
International Conference on Artificial Intelligence and Statistics, 2011.
File Details
 Date Uploaded
 20120927T19:28:32.000Z
 Date Modified
 20140501T00:08:50.327+00:00
 Audit Status
 Audits have not yet been run on this file.
 Characterization

File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 7911518
Last modified: 2015:10:12 15:21:3106:00
Filename: thesis_final.pdf
Original checksum: a4ece9bffcccc0328ca5553883338041
Well formed: true
Valid: true
Status message: Too many fonts to report; some fonts omitted. Total fonts = 1257
Page count: 126
User Activity  Date 
