Usage
  • 156 views
  • 137 downloads

Stable Dynamic Programming and Reinforcement Learning with Dual Representations

  • Author(s) / Creator(s)
  • Technical report TR07-05. We investigate novel, dual algorithms for dynamic programming and reinforcement learning, based on maintaining explicit representations of stationary distributions instead of value functions. In particular, we investigate the convergence properties of standard dynamic programming and reinforcement learning algorithms when they are converted to their natural dual form. Here we uncover advantages for the dual approach: dual update algorithms, since they are based on estimating normalized probability distributions rather than unbounded value functions, avoid divergence even in the presence of function approximation and off-policy updates. Moreover, dual update algorithms remain stable in situations where standard value function estimation diverges. | TRID-ID TR07-05

  • Date created
    2007
  • Subjects / Keywords
  • Type of Item
    Report
  • DOI
    https://doi.org/10.7939/R33G0M
  • License
    Attribution 3.0 International