Usage
  • 199 views
  • 158 downloads

Extending the Sliding-step Technique of Stochastic Gradient Descent to Temporal Difference Learning

  • Author / Creator
    Tian Tian
  • Stochastic gradient descent is at the heart of many recent advances in machine learning. In each of a series of steps, stochastic gradient descent processes an example and adjusts the weight vector in the direction that would most reduce the error for that example. A step-size parameter is used to control the amount of adjustments. Altogether, the step-size parameter, the error and the direction of adjustment form the update to the weight vector.

    Importance weighting is a common technique used in machine learning, where an example of a time step is weighted more than others. The scalars that weight the examples are called the importance weights, and they vary from time step to time step. Therefore, the update sizes fluctuate in proportion to the importance weights, which increases the chance for unstable behavior.

    This thesis extends a technique that handles importance weights developed by Karampatziakis and Langford in 2011, which we refer to as the sliding-step technique. The outcome of the sliding-step technique is an expression that moderates the effect of the importance weights and choice of feature vectors on the size of the updates.

    The primary contribution of this thesis is to extend sliding-step from the supervised learning setting to the temporal difference learning setting. For simplicity, we restrict our attention to the one step case, i.e. TD(0), with linear function approximation. We propose a new algorithm called sliding-step TD, and in the tabular case, we show it is convergent with probability one. Our empirical results suggest that sliding-step TD retains many of the favorable properties of the original supervised learning sliding-step algorithms. Finally, we consider applications to emphatic and residual-gradient algorithms, for which importance weightings are especially important.

  • Subjects / Keywords
  • Graduation date
    Fall 2018
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/R3VM43D2X
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.