ERA

Download the full-sized PDF of Regularization in reinforcement learningDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R34P5W

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Regularization in reinforcement learning Open Access

Descriptions

Other title
Subject/Keyword
Statistical Learning Theory
Regularized Least-Squares Regression
Regularized Fitted Q-Iteration
Regularized LSTD
Regularized Policy Iteration
Approximate Value/Policy Iteration
Error Propagation
Regularization
Model Selection
Machine Learning
Reinforcement Learning
Sequential Decision-Making Problems
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
Farahmand, Amir-massoud
Supervisor and department
Szepesvari, Csaba (Computing Science)
Jagersand, Martin (Computing Science)
Examining committee member and department
Schuurmans, Dale (Computing Science)
Sutton, Richard S. (Computing Science)
Melnikov, Alexander (Mathematical and Statistical Sciences)
Bartlett, Peter L. (UC Berkeley - Division of Computer Science, Department of Statistics)
Bowling, Michael (Computing Science)
Department
Department of Computing Science
Specialization

Date accepted
2011-09-30T18:51:44Z
Graduation date
2011-11
Degree
Doctor of Philosophy
Degree level
Doctoral
Abstract
This thesis studies the reinforcement learning and planning problems that are modeled by a discounted Markov Decision Process (MDP) with a large state space and finite action space. We follow the value-based approach in which a function approximator is used to estimate the optimal value function. The choice of function approximator, however, is nontrivial, as it depends on both the number of data samples and the MDP itself. The goal of this work is to introduce flexible and statistically-efficient algorithms that find close to optimal policies for these problems without much prior information about them. The recurring theme of this thesis is the application of the regularization technique to design value function estimators that choose their estimates from rich function spaces. We introduce regularization-based Approximate Value/Policy Iteration algorithms, analyze their statistical properties, and provide upper bounds on the performance loss of the resulted policy compared to the optimal one. The error bounds show the dependence of the performance loss on the number of samples, the capacity of the function space to which the estimated value function belongs, and some intrinsic properties of the MDP itself. Remarkably, the dependence on the number of samples in the task of policy evaluation is minimax optimal. We also address the problem of automatic parameter-tuning of reinforcement learning/planning algorithms and introduce a complexity regularization-based model selection algorithm. We prove that the algorithm enjoys an oracle-like property and it may be used to achieve adaptivity: the performance is almost as good as the performance of the unknown best parameters. Our two other contributions are used to analyze the aforementioned algorithms. First, we analyze the rate of convergence of the estimation error in regularized least-squares regression when the data is exponentially beta-mixing. We prove that up to a logarithmic factor, the convergence rate is the same as the optimal minimax rate available for the i.i.d. case. Second, we attend to the question of how the errors at each iteration of the approximate policy/value iteration influence the quality of the resulting policy. We provide results that highlight some new aspects of these algorithms.
Language
English
DOI
doi:10.7939/R34P5W
Rights
License granted by Amir massoud Farahmand (amirf@ualberta.ca) on 2011-09-28T23:29:09Z (GMT): Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of the above terms. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
2014-04-29T18:11:49.433+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 1472901
Last modified: 2015:10:12 19:15:19-06:00
Filename: Farahmand_Amir-massoud_Fall 2011.pdf
Original checksum: d59bd546470f6011df15f8841339c645
Well formed: false
Valid: false
Status message: No document catalog dictionary offset=0
File title: Regularization in Reinforcement Learning
File author: Amir-massoud Farahmand
Activity of users you follow
User Activity Date