Usage
  • 51 views
  • 55 downloads

Consistent Emphatic Temporal-Difference Learning

  • Author / Creator
    He, Jiamin
  • Off-policy policy evaluation has been a critical and challenging problem in reinforcement learning, and Temporal-Difference (TD) learning is one of the most important approaches for addressing it. There has been significant interest in searching for off-policy TD algorithms which find the same solution that would have been obtained in the on-policy regime. An important property of these algorithms is that their expected update has the same fixed point as that of On-policy TD(λ), which we call consistency. Notably, Full IS TD(λ) is the only existing consistent off-policy TD method under general linear function approximation but, unfortunately, has a high variance and is scarcely practical. This notorious high variance issue motivates the introduction of ETD(λ), which tames down the variance but has a biased fixed point. Inspired by these two methods, we propose a new consistent algorithm called Average Emphatic TD (AETD(λ)) with a transient bias, which strikes a balance between bias and variance. Further, we unify AETD(λ) with existing algorithms and obtain a new family of consistent algorithms called Consistent Emphatic TD (CETD(λ, β, ν)), which can control a smooth bias-variance trade-off by varying the speed at which the transient bias fades. Through theoretical analysis and experiments on a didactic example, we settle the consistency of CETD(λ, β, ν) and demonstrate this theoretical advantage empirically. Moreover, we show that CETD(λ, β, ν) converges faster to the lowest error in a complex task with a high variance.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-7skh-z321
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.