Download the full-sized PDF of Online Learning for Linearly Parametrized Control ProblemsDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Online Learning for Linearly Parametrized Control Problems Open Access


Other title
Online Learning
Reinforcement Learning
Confidence Sets
Linear Bandits
Type of item
Degree grantor
University of Alberta
Author or creator
Abbasi-Yadkori, Yasin
Supervisor and department
Szepesvari, Csaba (Computing Science)
Examining committee member and department
Borkar, Vivek S. (School of Technology and Computer Science)
Huang, Biao (Chemical and Materials Engineering)
Sutton, Richard (Computing Science)
Greiner, Russell (Computing Science)
Schuurmans, Dale (Computing Science)
Department of Computing Science
Statistical Machine Learning
Date accepted
Graduation date
Doctor of Philosophy
Degree level
In a discrete-time online control problem, a learner makes an effort to control the state of an initially unknown environment so as to minimize the sum of the losses he suffers, where the losses are assumed to depend on the individual state-transitions. Various models of control problems have been topic of intensive research for more than 50 years. Despite this, theoretical results for the online variant of the problem remain limited. In this thesis, we study several online control problems, ranging from the simplest state-less problems, also known as bandits, through classical control problems where the system dynamics is linear and the cost is quadratic in the state and the control, to more complex non-linear problems. The main topic is the design of algorithms for these problems and the development of finite-time performance guarantees for the algorithms proposed. A common theme of the problems is that they assume a linear parametric uncertainty. Accordingly, our methods employ a linear-in-the-parameters predictor and construct a confidence set that contains the true unknown parameter with sufficiently high probability. In particular, following the "optimism in the face of uncertainty" principle, the algorithms always use the parameter that gives rise to the lowest expected loss. In this framework, a tighter confidence set immediately results in a better performing online learning method. The first main contribution of the thesis is the construction of smaller confidence sets for the least-squares estimate. To arrive at these confidence sets, we derive a novel tail inequality for vector-valued martingales. Based on this new confidence set, we modify and, consequently, improve the analysis of the algorithm for the linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), and Rusmevichientong and Tsitsiklis (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. The second main contribution is the introduction of a novel technique to construct confidence sets, which we call online-to-confidence-set conversion. The technique allows us to construct high-probability confidence sets for linear prediction with correlated inputs given the predictions of any algorithm (e.g., online LASSO, exponentiated gradient algorithm (Kivinen and Warmuth, 1997), online least-squares (Lai et al., 1979, Auer et al., 2002b, Vovk, 2001), p-norm algorithms (Grove et al., 2001, Gentile and Littlestone, 1999)); in general, any algorithm whose objective is to achieve low regret with respect to the quadratic loss while using linear predictors. By construction, the size of the confidence set is directly governed by the regret of the online learning algorithm. As a result of this "reductionist" approach, progress in constructing better algorithms for online prediction problems directly translates into tighter confidence sets. As a demonstration of this new approach to constructing confidence sets, we introduce the sparse variant of linear stochastic bandits and show that a recent online algorithm together with our online-to-confidence-set conversion allows one to derive algorithms that can exploit if the unknown parameter vector determining the expected loss of actions is sparse. In the second part of the thesis, we study the average loss linear quadratic (LQ) control problem with unknown model parameters, also known as the LQ adaptive control problem in the control community. We design an algorithm and prove that its regret up to time T, apart from logarithmic factors, is O(sqrt{T}). To the best of our knowledge, this is the first time that an algorithm designed for this problem is shown to enjoy a sublinear finite-time regret bound. We also show that similar techniques can be employed to design and analyze an algorithm for a more general problem with nonlinear dynamics but linear parametric uncertainty. To the best of our knowledge this is the the first time that regret bounds are derived for these classes of control problems.
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication
Yasin Abbasi-Yadkori and Csaba Szepesvari. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, 2011.Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Improved algorithms for linear stochastic bandits. In Proceedings of the Twenty Fifth Annual Conference on Neural Information Processing Systems, 2011.Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 7911518
Last modified: 2015:10:12 15:21:31-06:00
Filename: thesis_final.pdf
Original checksum: a4ece9bffcccc0328ca5553883338041
Well formed: true
Valid: true
Status message: Too many fonts to report; some fonts omitted. Total fonts = 1257
Page count: 126
Activity of users you follow
User Activity Date