This is a decommissioned version of ERA which is running to enable completion of migration processes. All new collections and items and all edits to existing items should go to our new ERA instance at https://ualberta.scholaris.ca - Please contact us at erahelp@ualberta.ca for assistance!
- 695 views
- 775 downloads
Reinforcement Learning Algorithms for MDPs
-
- Author(s) / Creator(s)
-
Technical report TR09-13. This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference learning. Next, we compare incremental and batch algorithmic variants and discuss the impact of the choice of the function approximation method on the success of learning. In the second half, we describe methods that target the problem of learning to control an MDP. Here online and active learning are discussed first, followed by a description of direct and actor-critic methods. | TRID-ID TR09-13
-
- Date created
- 2009
-
- Subjects / Keywords
-
- Artificial Intelligence
- Monte-Carlo methods
- Reinforcement learning
- Actor-critic methods
- Stochastic approximation
- Markov decision processes
- Active learning
- Overfitting
- L:east-sqares methods
- Temporal difference learning
- Simulations
- Policy gradient
- Two-timescale stochastic approximation
- Q-learning
- Online learning
- Function approximation
- Natural gradient
- PAC-learning
- Machine Learning
- Planning
- Stochastic gradient methods
- Stimulation optimization
- Bias-variance tradeoff
-
- Type of Item
- Report
-
- License
- Attribution 3.0 International