Search
Skip to Search Results- 25White, Martha (Computing Science)
- 3White, Adam (Computing Science)
- 1Bowling, Michael (Computing Science)
- 1Farahmand, Amir-massoud (Computer Science, University of Toronto)
- 1Fyshe, Alona (Computing Science)
- 1Greiner, Russell (Computing Science)
- 12Reinforcement Learning
- 6Machine Learning
- 3Neural Networks
- 3Reinforcement learning
- 2Dyna
- 2Exploration
-
Spring 2022
A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution. In this work, focusing on fixed design linear regression with Gaussian noise and a...
-
Spring 2020
Reinforcement Learning is a formalism for learning by trial and error. Unfortunately, trial and error can take a long time to find a solution if the agent does not efficiently explore the behaviours available to it. Moreover, how an agent ought to explore depends on the task that the agent is...
-
Spring 2022
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
-
Chasing Hallucinated Value: A Pitfall of Dyna Style Algorithms with Imperfect Environment Models
DownloadSpring 2020
In Dyna style algorithms, reinforcement learning (RL) agents use a model of the environment to generate simulated experience. By updating on this simulated experience, Dyna style algorithms allow agents to potentially learn control policies in fewer environment interactions than agents that use...
-
Fall 2023
Multilevel action selection is a reinforcement learning technique in which an action is broken into two parts, the type and the parameters. When using multilevel action selection in reinforcement learning, one must break the action space into multiple subsets. These subsets are typically disjoint...
-
Fall 2023
The problem of missing data is omnipresent in a wide range of real-world datasets. When learning and predicting on this data with neural networks, the typical strategy is to fill-in or complete these missing values in the dataset, called impute-then-regress. Much less common is to attempt to...
-
Spring 2019
In this thesis we introduce a new loss for regression, the Histogram Loss. There is some evidence that, in the problem of sequential decision making, estimating the full distribution of return offers a considerable gain in performance, even though only the mean of that distribution is used in...
-
Feature Generalization in Deep Reinforcement Learning: An Investigation into Representation Properties
DownloadFall 2022
In this thesis, we investigate the connection between the properties and the generalization performance of representations learned by deep reinforcement learning algorithms. Much of the earlier work on representation learning for reinforcement learning focused on designing fixed-basis...
-
Fall 2022
This thesis investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives,...
-
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
DownloadFall 2020
Policy gradient methods typically estimate both explicit policy and value functions. The long-extant view of policy gradient methods as approximate policy iteration---alternating between policy evaluation and policy improvement by greedification---is a helpful framework to elucidate algorithmic...