During the week of September 24, we will be implementing some features to improve ERA. Some users may encounter errors when depositing or editing files. We apologize for any potential inconvenience, and will remove this message when we are done our maintenance upgrade!
SearchSkip to Search Results
- 2Function approximation
- 2Temporal difference learning
- 2Two-timescale stochastic approximation
- 1Active learning
- 1Actor-critic methods
- 1Actor-critic reinforcement learning algorithms
Technical report TR09-13. This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference...
Technical report TR09-10. We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which...