Faster Gradient-TD Algorithms

Hackman, Leah M

doi:doi:10.7939/R3JS95

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

477 views
473 downloads

Faster Gradient-TD Algorithms

Author / Creator

Hackman, Leah M
Gradient-TD methods are a new family of learning algorithms that are stable and convergent under a wider range of conditions than previous reinforcement learning algorithms. In particular, gradient-TD algorithms enable off-policy problems---problems where the distribution of the data is different from the distribution the learner seeks to learn about---while using function approximation in a data-efficient on-line manner. Despite these positive features, previous empirical work, though limited, suggests that gradient-TD methods are slower than they could be. One example of this slowness is in on-policy problems, where gradient-TD methods have been shown to be slower than conventional-TD methods in some cases (Maei, 2011). In this thesis, we examine this slowness through on- and off-policy experiments and introduce several variations of existing gradient-TD algorithms in search of “faster” gradient-TD methods. We then introduce hybrid gradient-TD methods, a class of algorithms unique in their ability to use conventional-TD and gradient-TD learning updates when appropriate. We introduce three algorithms, two of which are hybrid gradient-TD methods and close with the first experimental results. In particular, we present promising results which indicate one of our new algorithms provides the benefits of a hybrid gradient-TD method while outperforming previous gradient-TD methods.
Subjects / Keywords
Graduation date

Spring 2013
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3JS95
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Sutton, Richard (Computing Science)
Examining committee members and their departments
- Schuurmans, Dale (Computing Science)
- Sutton, Richard (Computing Science)
- Reformat, Marek (Electrical & Computer Engineering)