ERA

Download the full-sized PDF of Dual Representations for Dynamic Programming and Reinforcement LearningDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R37940T9R

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Computing Science, Department of

Collections

This file is in the following collections:

Technical Reports (Computing Science)

Dual Representations for Dynamic Programming and Reinforcement Learning Open Access

Descriptions

Author or creator
Wang, Tao
Bowling, Michael
Schuurmans, Dale
Additional contributors
Subject/Keyword
Reinforcement Learning
Dynamic Programming
Type of item
Computing Science Technical Report
Computing science technical report ID
TR06-26
Language
English
Place
Time
Description
Technical report TR06-26. We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems.
Date created
2006
DOI
doi:10.7939/R37940T9R
License information
Creative Commons Attribution 3.0 Unported
Rights

Citation for previous publication

Source
Link to related item

File Details

Date Uploaded
Date Modified
2014-04-24T22:47:58.378+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 154913
Last modified: 2015:10:12 13:08:52-06:00
Filename: TR06-26.pdf
Original checksum: 22121dc979cfb09d5b80975af3fd821d
Well formed: true
Valid: true
Page count: 8
Activity of users you follow
User Activity Date