Download the full-sized PDF of Gradient Temporal-Difference Learning AlgorithmsDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Gradient Temporal-Difference Learning Algorithms Open Access


Other title
Policy Evaluation
Temporal-Difference learning
Reinforcement Learning
Stochastic Gradient-Descent
Value Function Approximation
Type of item
Degree grantor
University of Alberta
Author or creator
Maei, Hamid Reza
Supervisor and department
Richard S. Sutton (Computing Science)
Examining committee member and department
Marek Reformat (Electrical and Computer Engineering)
Csaba Szepesvari (Computing Science)
Geoffrey J. Gordon (Machine Learning Department, Carnegie Mellon University)
Dale Schuurmans (Computing Science)
Department of Computing Science

Date accepted
Graduation date
Doctor of Philosophy
Degree level
We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with the number of learning parameters. TD methods are powerful prediction techniques, and with function approximation form a core part of modern reinforcement learning (RL). However, the most popular TD methods, such as TD(lambda), Q-learning and Sarsa, may become unstable and diverge when combined with function approximation. In particular, convergence cannot be guaranteed for these methods when they are used with off-policy training. Off-policy training---training on data from one policy in order to learn the value of another---is useful in dealing with the exploration-exploitation tradeoff. As function approximation is needed for large-scale applications, this stability problem is a key impediment to extending TD methods to real-world large-scale problems. The new family of TD algorithms, also called gradient-TD methods, are based on stochastic gradient-descent in a Bellman error objective function. We provide convergence proofs for general settings, including off-policy learning with unrestricted features, and nonlinear function approximation. Gradient-TD algorithms are on-line, incremental, and extend conventional TD methods to off-policy learning while retaining a convergence guarantee and only doubling computational requirements. Our empirical results suggest that many members of the gradient-TD algorithms may be slower than conventional TD on the subset of training cases in which conventional TD methods are sound. Our latest gradient-TD algorithms are ``hybrid" in that they become equivalent to conventional TD---in terms of asymptotic rate of convergence---in on-policy problems.
License granted by Hamid Maei ( on 2011-08-29T16:52:17Z (GMT): Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of the above terms. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 2387009
Last modified: 2015:10:12 15:18:56-06:00
Filename: Hamid_Maei_PhDThesis.pdf
Original checksum: fffce48108d5e31f6639492507d39df6
Well formed: true
Valid: true
Status message: File header gives version as 1.4, but catalog dictionary gives version as 1.3
Status message: Too many fonts to report; some fonts omitted. Total fonts = 1362
File title: PhDThesis.pdf
File author: Hamid Maei
Page count: 125
Activity of users you follow
User Activity Date