This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

Search

Filter

Subject / Keyword

Show 4 more ...

Supervisors

1Yash Satsangi (Computing Science)

Show 1 more ...

Author / Creator / Contributor

Year

Collections

Languages

6English

Item type

6Thesis

Departments

6Department of Computing Science

Adaptive Representation for Policy Gradient
Download

Spring 2015

Das Gupta, Ujjwal

Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Methods like policy gradient, that do not learn a value function and instead directly represent policy, often need fewer parameters to learn good policies....
Adaptive Search Control through Meta-Gradient Reinforcement Learning
Download

Spring 2024

Burega, Bradley Thomas

In model-based reinforcement learning, an agent can improve its policy by planning: learning from experience generated by a model. Search control is the problem of determining which starting state should be used to generate this experience. Given a limited planning budget, an agent should be...
Chasing Hallucinated Value: A Pitfall of Dyna Style Algorithms with Imperfect Environment Models
Download

Spring 2020

Jafferjee, Taher

In Dyna style algorithms, reinforcement learning (RL) agents use a model of the environment to generate simulated experience. By updating on this simulated experience, Dyna style algorithms allow agents to potentially learn control policies in fewer environment interactions than agents that use...
Efficient Exploration in Reinforcement Learning through Time-Based Representations
Download

Spring 2019

Cholodovskis Machado, Marlos

In the reinforcement learning (RL) problem an agent must learn how to act optimally through trial-and-error interactions with a complex, unknown, stochastic environment. The actions taken by the agent influence not just the immediate reward it observes but also the future states and rewards it...
Targeted Search Control in AlphaZero for Effective Policy Improvement
Download

Spring 2023

Trudeau, Alexandre

AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in the games of chess, shogi, and Go via policy iteration. To be an effective policy improvement operator, AlphaZero’s search needs to have accurate value estimates for the states that appear in its search...
Useful Policy Invariant Shaping from Arbitrary Advice
Download

Spring 2020

Behboudian, Paniz

Reinforcement learning (RL) is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary...