Search

Filter

Subject / Keyword

Show 4 more ...

Departments

10Department of Computing Science

Languages

10English

Supervisors

Show 3 more ...

Author / Creator / Contributor

Show 4 more ...

Year

Collections

Item type

10Thesis

Analysis of an Alternate Policy Gradient Estimator for Softmax Policies
Download

Spring 2022

Garg, Shivam

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
Chasing Hallucinated Value: A Pitfall of Dyna Style Algorithms with Imperfect Environment Models
Download

Spring 2020

Jafferjee, Taher

In Dyna style algorithms, reinforcement learning (RL) agents use a model of the environment to generate simulated experience. By updating on this simulated experience, Dyna style algorithms allow agents to potentially learn control policies in fewer environment interactions than agents that use...
Continuous Multilevel Actions in Reinforcement Learning
Download

Fall 2023

Mitchell, Daniel

Multilevel action selection is a reinforcement learning technique in which an action is broken into two parts, the type and the parameters. When using multilevel action selection in reinforcement learning, one must break the action space into multiple subsets. These subsets are typically disjoint...
Feature Generalization in Deep Reinforcement Learning: An Investigation into Representation Properties
Download

Fall 2022

Miahi, Erfan

In this thesis, we investigate the connection between the properties and the generalization performance of representations learned by deep reinforcement learning algorithms. Much of the earlier work on representation learning for reinforcement learning focused on designing fixed-basis...
Goal-Space Planning with Subgoal Models
Download

Fall 2022

Lo, Chunlok

This thesis investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives,...
Leveraging Off-Policy Prediction in Recurrent Networks for Reinforcement Learning
Download

Fall 2023

Schlegel, Matthew K

Partial observability---when the senses lack enough detail to make an optimal decision---is the reality of any decision making agent acting in the real world. While an agent could be made to make due with its available senses, taking advantage of the history of senses can provide more context and...
Selective Dyna-style Planning Using Neural Network Models with Limited Capacity
Download

Spring 2020

Zaheer, Muhammad

In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this thesis, we investigate the idea of using an imperfect...
Solving Common-Payoff Games with Approximate Policy Iteration
Download

Fall 2020

Sokota, Samuel

For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov...
Structural Credit Assignment in Neural Networks using Reinforcement Learning
Download

Fall 2021

Gupta, Dhawal

Structural credit assignment in neural networks is a long-standing problem, with a variety of alternatives to backpropagation proposed to allow for local training of nodes. One of the early strategies was to treat each node as an agent and use a reinforcement learning method called REINFORCE to...
Vector Step-size Adaptation for Continual, Online Prediction
Download

Fall 2019

Jacobsen, Andrew

In this thesis, we investigate different vector step-size adaptation approaches for continual, online prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad,...