Search

Filter

Subject / Keyword

Show 4 more ...

Departments

17Department of Computing Science

Languages

17English

Author / Creator / Contributor

Show 4 more ...

Year

Collections

Item type

17Thesis

Supervisors

Show 4 more ...

A Unified View of Multi-step Temporal Difference Learning
Download

Fall 2018

Kristopher De Asis

Temporal-difference (TD) learning is an important approach for predictive knowledge representation and sequential decision making. Within TD learning exists multi-step methods which unify one-step TD learning and Monte Carlo methods in a way where intermediate algorithms can outperform either...
Agent-State Construction with Auxiliary Inputs
Download

Fall 2022

Tao, Ruo Yu

In most, if not every, realistic sequential decision-making tasks, the decision-making agent is not able to model the full complexity of the world. In reinforcement learning, the environment is often much larger and more complex than the agent, a setting also known as partial observability. In...
An Empirical Study of Model-Free Exploration for Deep Reinforcement Learning
Download

Fall 2021

Zhao, Xutong

Reinforcement learning (RL) is a learning paradigm focusing on how agents interact with an environment to maximize cumulative reward signals emitted from the environment. Exploration versus exploitation challenge is critical in RL research: the agent ought to trade off between taking the known...
Dark Hex: A Large Scale Imperfect Information Game
Download

Fall 2022

Tapkan, Mustafa B

Imperfect information games model many large-scale real-world problems. Hex is the classic two-player zero-sum no-draw connection game where each player wants to join their two sides. Dark Hex is an imperfect information version of Hex in which each player sees only their own moves. Finding Nash...
Ensembling Diverse Policies Improves Generalization of Deep Reinforcement Learning Algorithms to Environmental Changes in Continuous Control Tasks
Download

Fall 2023

Zhumabekov, Abilmansur

Deep Reinforcement Learning (DRL) algorithms have shown great success in solving continuous control tasks. However, they often struggle to generalize to changes in the environment. Although retraining may help policies adapt to changes, it may be quite costly in some environments. Ensemble...
Estimating Variance of Returns using Temporal Difference Methods
Download

Spring 2021

Bennett, Brendan

Temporal difference (TD) methods provide a powerful means of learning to make predictions in an online, model-free, and highly scalable manner. In the reinforcement learning (RL) framework, we formalize these prediction targets in terms of a (possibly discounted) sum of rewards, called the...
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Download

Fall 2020

Chan, Alan

Policy gradient methods typically estimate both explicit policy and value functions. The long-extant view of policy gradient methods as approximate policy iteration---alternating between policy evaluation and policy improvement by greedification---is a helpful framework to elucidate algorithmic...
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
Download

Fall 2022

Neumann,Samuel

Actor-Critics are a popular class of algorithms for control. Their ability to learn complex behaviours in continuous-action environments make them directly applicable to many real-world scenarios. These algorithms are composed of two parts - a critic and an actor. The critic learns to critique...
Learning Agent State Online with Recurrent Generate-and-Test
Download

Spring 2022

Samani, Abolfazl

The concept of state is fundamental to a reinforcement learning agent. The state is the input to the agent's action-selection policy, value functions, and environmental model. A reinforcement learning agent interacts with the environment by performing actions and receiving observations, resulting...
MooZi: A High-Performance Game-playing System that Plans with a Learned Model
Download

Spring 2023

Wang, Zeyi

The intent of this thesis is to develop a high-performance open-source system that plans with a learned model and to understand the algorithm through extensive analysis. We formulate the problem of maximizing accumulated rewards in Markov Decision Processes, and we frame playing games as such...