Search
Skip to Search Results- 10Reinforcement Learning
- 2Model-based Reinforcement Learning
- 1Alternate policy gradient estimator
- 1Coagent Networks
- 1Cooperative
- 1Credit Assignment
- 10White, Martha (Computing Science)
- 2White, Adam (Computing Science)
- 1Bowling, Michael (Computing Science)
- 1Lanctot, Marc (NA)
- 1Machado, Marlos (Computing Science)
- 1Mahmood, A. Rupam (Computing Science)
-
Spring 2022
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
-
Chasing Hallucinated Value: A Pitfall of Dyna Style Algorithms with Imperfect Environment Models
DownloadSpring 2020
In Dyna style algorithms, reinforcement learning (RL) agents use a model of the environment to generate simulated experience. By updating on this simulated experience, Dyna style algorithms allow agents to potentially learn control policies in fewer environment interactions than agents that use...
-
Fall 2023
Multilevel action selection is a reinforcement learning technique in which an action is broken into two parts, the type and the parameters. When using multilevel action selection in reinforcement learning, one must break the action space into multiple subsets. These subsets are typically disjoint...
-
Feature Generalization in Deep Reinforcement Learning: An Investigation into Representation Properties
DownloadFall 2022
In this thesis, we investigate the connection between the properties and the generalization performance of representations learned by deep reinforcement learning algorithms. Much of the earlier work on representation learning for reinforcement learning focused on designing fixed-basis...
-
Fall 2022
This thesis investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives,...
-
Fall 2023
Partial observability---when the senses lack enough detail to make an optimal decision---is the reality of any decision making agent acting in the real world. While an agent could be made to make due with its available senses, taking advantage of the history of senses can provide more context and...
-
Spring 2020
In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this thesis, we investigate the idea of using an imperfect...
-
Fall 2020
For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov...
-
Fall 2021
Structural credit assignment in neural networks is a long-standing problem, with a variety of alternatives to backpropagation proposed to allow for local training of nodes. One of the early strategies was to treat each node as an agent and use a reinforcement learning method called REINFORCE to...
-
Fall 2019
In this thesis, we investigate different vector step-size adaptation approaches for continual, online prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad,...