Search
Skip to Search Results- 25White, Martha (Computing Science)
- 21Bowling, Michael (Computing Science)
- 3Schuurmans, Dale (Computing Science)
- 3White, Adam (Computing Science)
- 1Bellemare, Marc (Google Brain)
- 1Farahmand, Amir-massoud (Computer Science, University of Toronto)
- 17Reinforcement Learning
- 10Machine Learning
- 7Artificial Intelligence
- 4Machine learning
- 4Reinforcement learning
- 3Exploration
-
Spring 2020
In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this thesis, we investigate the idea of using an imperfect...
-
Fall 2020
For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov...
-
Strange springs in many dimensions: how parametric resonance can explain divergence under covariate shift.
DownloadFall 2021
Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on independently and identically ditributed (iid) data sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated inputs such as continual learning and reinforcement learning....
-
Fall 2021
Structural credit assignment in neural networks is a long-standing problem, with a variety of alternatives to backpropagation proposed to allow for local training of nodes. One of the early strategies was to treat each node as an agent and use a reinforcement learning method called REINFORCE to...
-
Spring 2023
AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in the games of chess, shogi, and Go via policy iteration. To be an effective policy improvement operator, AlphaZero’s search needs to have accurate value estimates for the states that appear in its search...
-
Fall 2013
Given nothing but the generative model of the environment, Monte Carlo Tree Search techniques have recently shown spectacular results on domains previously thought to be intractable. In this thesis we try to develop generic techniques for temporal abstraction inside MCTS that would allow the...
-
Spring 2014
Efficient, unbiased estimation of agent performance is essential for drawing statistically significant conclusions in multi-agent domains with high outcome variance. Naive Monte Carlo estimation is often insufficient, as it can require a prohibitive number of samples, especially when evaluating...
-
Spring 2018
Decision-making problems with two agents can be modeled as two player games, and a Nash equilibrium is the basic solution concept describing good play in adversarial games. Computing this equilibrium solution for imperfect information games, where players have private, hidden information, is...
-
Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation
DownloadSpring 2024
Offline reinforcement learning (RL) involves learning policies from datasets, rather than online interaction. The dissertation first investigates a critical component in offline RL: offline policy selection (OPS). Given that most offline RL algorithms require careful hyperparameter tuning, we...
-
Spring 2020
Reinforcement learning (RL) is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary...