SearchSkip to Search Results
- 19Reinforcement Learning
- 4Machine Learning
- 3Artificial Intelligence
- 2Policy Gradient
- 2Reinforcement learning
- 1Approximate Value/Policy Iteration
- 3Bowling, Michael
- 3Schuurmans, Dale
- 3Wang, Tao
- 2Lizotte, Daniel
- 1Abbasi Brujeni, Lena
- 1Abbasi-Yadkori, Yasin
- 13Graduate Studies and Research, Faculty of
- 13Graduate Studies and Research, Faculty of/Theses and Dissertations
- 5Computing Science, Department of
- 5Computing Science, Department of/Technical Reports (Computing Science)
- 1WISEST Summer Research Program
- 1WISEST Summer Research Program/WISEST Research Posters
Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Methods like policy gradient, that do not learn a value function and instead directly represent policy, often need fewer parameters to learn good policies....
Current medical imaging professional training uses an apprenticeship model with students following an established doctor and viewing their cases, in what is called a practicum. This posses an issue as students are limited to the cases available during their practicum. To resolve this automated...
Technical report TR08-16. We propose a dual approach to dynamic programming and reinforcement learning based on maintaining an explicit representation of visit distributions as opposed to value functions. An advantage of working in the dual is that it allows one to exploit techniques for...
Technical report TR06-26. We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit...
In this thesis, a Reinforcement Learning (RL) method called Sarsa is used to dynamically tune a PI-controller for a Continuous Stirred Tank Heater (CSTH) experimental setup. The proposed approach uses an approximate model to train the RL agent in the simulation environment before implementation...
Off-policy reinforcement learning is useful in many contexts. Maei, Sutton, Szepesvari, and others, have recently introduced a new class of algorithms, the most advanced of which is GQ(lambda), for off-policy reinforcement learning. These algorithms are the first stable methods for general...
Technical report TR07-12. One key topic in reinforcement learning is function approximation which is critical for the success of reinforcement learning in domains with large state spaces. Unfortunately, function approximation can lead to several problems including the suboptimality of the...
This research focuses on developing AI agents that play arbitrary Atari 2600 console games without having any game-specific assumptions or prior knowledge. Two main approaches are considered: reinforcement learning based methods and search based methods. The RL-based methods use feature vectors...
We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with the number of learning parameters. TD methods are powerful prediction techniques, and with...