Search

Filter

Collections

Supervisors

Author / Creator / Contributor

Show 4 more ...

Subject / Keyword

Show 4 more ...

Year

Languages

12English

Item type

12Thesis

Departments

12Department of Computing Science

An Exploration of Predictive Representations of State
Download

Spring 2020

Ma, Chen

The predictive representations hypothesis is that representing the state of the world in terms of predictions about the future will result in good generalization. In this thesis, good generalization is specifically quantified by good learning performance in both accuracy and speed when predicting...
Estimating Variance of Returns using Temporal Difference Methods
Download

Spring 2021

Bennett, Brendan

Temporal difference (TD) methods provide a powerful means of learning to make predictions in an online, model-free, and highly scalable manner. In the reinforcement learning (RL) framework, we formalize these prediction targets in terms of a (possibly discounted) sum of rewards, called the...
Experiments in off-policy reinforcement learning with the GQ(lambda) algorithm
Download

Spring 2011

Delp, Michael

Off-policy reinforcement learning is useful in many contexts. Maei, Sutton, Szepesvari, and others, have recently introduced a new class of algorithms, the most advanced of which is GQ(lambda), for off-policy reinforcement learning. These algorithms are the first stable methods for general...
Extending the Sliding-step Technique of Stochastic Gradient Descent to Temporal Difference Learning
Download

Fall 2018

Tian Tian

Stochastic gradient descent is at the heart of many recent advances in machine learning. In each of a series of steps, stochastic gradient descent processes an example and adjusts the weight vector in the direction that would most reduce the error for that example. A step-size parameter is used...
Faster Gradient-TD Algorithms
Download

Spring 2013

Hackman, Leah M

Gradient-TD methods are a new family of learning algorithms that are stable and convergent under a wider range of conditions than previous reinforcement learning algorithms. In particular, gradient-TD algorithms enable off-policy problems---problems where the distribution of the data is different...
Letting the Agent Take the Wheel: Principles for Constructive and Predictive Knowledge
Download

Fall 2023

Kearney, Alexandra K

Of all the capabilities of natural intelligence, one of the most exceptional is the ability to expand upon and refine knowledge of the world through subjective experience. Therefore, a longstanding goal of Artificial Intelligence has been to replicate this success: to enable artificial agents to...
Online Off-policy Prediction
Download

Spring 2022

Sina Ghiassian

In this dissertation, we study online off-policy temporal-difference learning algorithms, a class of reinforcement learning algorithms that can learn predictions in an efficient and scalable manner. The contributions of this dissertation are one of the two kinds: (1) empirically studying existing...
Policy Gradient Reinforcement Learning Without Regret
Download

Spring 2015

Dick, Travis B

This thesis consists of two independent projects, each contributing to a central goal of artificial intelligence research: to build computer systems that are capable of performing tasks and solving problems without problem-specific direction from us, their designers. I focus on two formal...
Reinforcement Learning and Simulation-Based Search in Computer Go
Download

Fall 2009

Silver, David

Learning and planning are two fundamental problems in artificial intelligence. The learning problem can be tackled by reinforcement learning methods, such as temporal-difference learning, which update a value function from real experience, and use function approximation to generalise across...
Strengths, Weaknesses, and Combinations of Model-based and Model-free Reinforcement Learning
Download

Spring 2016

Asadi Atui, Kavosh

Reinforcement learning algorithms are conventionally divided into two approaches: a model-based approach that builds a model of the environment and then computes a value function from the model, and a model-free approach that directly estimates the value function. The first contribution of this...