Extending the Sliding-step Technique of Stochastic Gradient Descent to Temporal Difference Learning

Tian Tian

doi:doi:10.7939/R3VM43D2X

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

260 views
208 downloads

Extending the Sliding-step Technique of Stochastic Gradient Descent to Temporal Difference Learning

Author / Creator

Tian Tian
Stochastic gradient descent is at the heart of many recent advances in machine learning. In each of a series of steps, stochastic gradient descent processes an example and adjusts the weight vector in the direction that would most reduce the error for that example. A step-size parameter is used to control the amount of adjustments. Altogether, the step-size parameter, the error and the direction of adjustment form the update to the weight vector.

Importance weighting is a common technique used in machine learning, where an example of a time step is weighted more than others. The scalars that weight the examples are called the importance weights, and they vary from time step to time step. Therefore, the update sizes fluctuate in proportion to the importance weights, which increases the chance for unstable behavior.

This thesis extends a technique that handles importance weights developed by Karampatziakis and Langford in 2011, which we refer to as the sliding-step technique. The outcome of the sliding-step technique is an expression that moderates the effect of the importance weights and choice of feature vectors on the size of the updates.

The primary contribution of this thesis is to extend sliding-step from the supervised learning setting to the temporal difference learning setting. For simplicity, we restrict our attention to the one step case, i.e. TD(0), with linear function approximation. We propose a new algorithm called sliding-step TD, and in the tabular case, we show it is convergent with probability one. Our empirical results suggest that sliding-step TD retains many of the favorable properties of the original supervised learning sliding-step algorithms. Finally, we consider applications to emphatic and residual-gradient algorithms, for which importance weightings are especially important.
Subjects / Keywords
- Temporal Difference Learning
Graduation date

Fall 2018
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3VM43D2X
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Sutton, Richard (Computing Science)