ERA

Download the full-sized PDF of Natural Actor - Critic AlgorithmsDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3Q814S5W

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Computing Science, Department of

Collections

This file is in the following collections:

Technical Reports (Computing Science)

Natural Actor - Critic Algorithms Open Access

Descriptions

Author or creator
Bhatnagar, Shalabh
Sutton, Richard
Ghavamzadeh, Mohammad
Lee, Mark
Additional contributors
Subject/Keyword
Bootstrapping
Two-timescale stochastic approximation
Temporal difference learning
Approximate dynamic programming
Actor-critic reinforcement learning algorithms
Function approximation
Natural-gradient
Policy gradient methods
Type of item
Report
Language
English
Place
Time
Description
Technical report TR09-10. We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. We present empirical results verifying the convergence of our algorithms.
Date created
2009
DOI
doi:10.7939/R3Q814S5W
License information
Creative Commons Attribution 3.0 Unported
Rights

Citation for previous publication

Source
Link to related item

File Details

Date Uploaded
Date Modified
2014-05-01T02:37:06.206+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 568797
Last modified: 2015:10:12 20:38:06-06:00
Filename: TR09-10.pdf
Original checksum: ffd5cf80f490ce291fd3baafed276cdb
Well formed: true
Valid: true
Page count: 39
Activity of users you follow
User Activity Date