Search

Filter

Departments

45Department of Computing Science

Languages

45English

Supervisors

Show 4 more ...

Author / Creator / Contributor

Show 4 more ...

Subject / Keyword

Show 4 more ...

Year

Collections

Item type

45Thesis

Selective Dyna-style Planning Using Neural Network Models with Limited Capacity
Download

Spring 2020

Zaheer, Muhammad

In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this thesis, we investigate the idea of using an imperfect...
Solving Common-Payoff Games with Approximate Policy Iteration
Download

Fall 2020

Sokota, Samuel

For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov...
Strange springs in many dimensions: how parametric resonance can explain divergence under covariate shift.
Download

Fall 2021

Banman, Kirby

Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on independently and identically ditributed (iid) data sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated inputs such as continual learning and reinforcement learning....
Structural Credit Assignment in Neural Networks using Reinforcement Learning
Download

Fall 2021

Gupta, Dhawal

Structural credit assignment in neural networks is a long-standing problem, with a variety of alternatives to backpropagation proposed to allow for local training of nodes. One of the early strategies was to treat each node as an agent and use a reinforcement learning method called REINFORCE to...
Targeted Search Control in AlphaZero for Effective Policy Improvement
Download

Spring 2023

Trudeau, Alexandre

AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in the games of chess, shogi, and Go via policy iteration. To be an effective policy improvement operator, AlphaZero’s search needs to have accurate value estimates for the states that appear in its search...
Temporal Abstraction in Monte Carlo Tree Search
Download

Fall 2013

Vafadost, Mostafa

Given nothing but the generative model of the environment, Monte Carlo Tree Search techniques have recently shown spectacular results on domains previously thought to be intractable. In this thesis we try to develop generic techniques for temporal abstraction inside MCTS that would allow the...
The Baseline Approach to Agent Evaluation
Download

Spring 2014

Davidson, Joshua

Efficient, unbiased estimation of agent performance is essential for drawing statistically significant conclusions in multi-agent domains with high outcome variance. Naive Monte Carlo estimation is often insufficient, as it can require a prohibitive number of samples, especially when evaluating...
Time and Space: Why Imperfect Information Games are Hard
Download

Spring 2018

Burch, Neil

Decision-making problems with two agents can be modeled as two player games, and a Nash equilibrium is the basic solution concept describing good play in adversarial games. Computing this equilibrium solution for imperfect information games, where players have private, hidden information, is...
Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation
Download

Spring 2024

Liu, Vincent

Offline reinforcement learning (RL) involves learning policies from datasets, rather than online interaction. The dissertation first investigates a critical component in offline RL: offline policy selection (OPS). Given that most offline RL algorithms require careful hyperparameter tuning, we...
Useful Policy Invariant Shaping from Arbitrary Advice
Download

Spring 2020

Behboudian, Paniz

Reinforcement learning (RL) is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary...