SearchSkip to Search Results
- 1Asadi Atui, Kavosh
- 1Ashley, Dylan R
- 1Bennett, Brendan
- 1Delp, Michael
- 1Dick, Travis B
- 1Hackman, Leah M
The predictive representations hypothesis is that representing the state of the world in terms of predictions about the future will result in good generalization. In this thesis, good generalization is specifically quantified by good learning performance in both accuracy and speed when predicting...
Temporal difference (TD) methods provide a powerful means of learning to make predictions in an online, model-free, and highly scalable manner. In the reinforcement learning (RL) framework, we formalize these prediction targets in terms of a (possibly discounted) sum of rewards, called the...
Off-policy reinforcement learning is useful in many contexts. Maei, Sutton, Szepesvari, and others, have recently introduced a new class of algorithms, the most advanced of which is GQ(lambda), for off-policy reinforcement learning. These algorithms are the first stable methods for general...
Stochastic gradient descent is at the heart of many recent advances in machine learning. In each of a series of steps, stochastic gradient descent processes an example and adjusts the weight vector in the direction that would most reduce the error for that example. A step-size parameter is used...
Gradient-TD methods are a new family of learning algorithms that are stable and convergent under a wider range of conditions than previous reinforcement learning algorithms. In particular, gradient-TD algorithms enable off-policy problems---problems where the distribution of the data is different...
This thesis consists of two independent projects, each contributing to a central goal of artificial intelligence research: to build computer systems that are capable of performing tasks and solving problems without problem-specific direction from us, their designers. I focus on two formal...
Learning and planning are two fundamental problems in artificial intelligence. The learning problem can be tackled by reinforcement learning methods, such as temporal-difference learning, which update a value function from real experience, and use function approximation to generalise across...
Reinforcement learning algorithms are conventionally divided into two approaches: a model-based approach that builds a model of the environment and then computes a value function from the model, and a model-free approach that directly estimates the value function. The first contribution of this...
The backpropagation algorithm is a fundamental algorithm for training modern artificial neural networks (ANNs). However, it is known the backpropagation algorithm performs poorly on changing problems. We demonstrate the backpropagation algorithm can perform poorly on a clear, generic, changing...
This thesis is offered as a step forward in our understanding of forgetting in artificial neural networks. ANNs are a learning system loosely based on our understanding of the brain and are responsible for recent breakthroughs in artificial intelligence. However, they have been reported to be...