Representation and General Value Functions

Sherstan, Craig

doi:doi:10.7939/r3-8bev-ap57

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

622 views
489 downloads

Representation and General Value Functions

Author / Creator

Sherstan, Craig
Research in artificial general intelligence aims to create agents that can learn from their own experience to solve arbitrary tasks in complex and dynamic settings. To do so effectively and efficiently, such an agent must be able to predict how its environment will change both dependently and independently of its own actions. General value functions (GVFs) are one approach to representing such relationships. A single GVF poses a predictive question defined by three components: a behavior (policy), a prediction timescale, and a predicted signal (cumulant). Estimated answers to these questions can be learned efficiently from the agent's own experience using temporal-difference learning methods. The agent's collection of GVF questions and corresponding answers can be viewed as forming a predictive model of the agent's interaction with its environment. Ultimately, such a model may enable an agent to understand its environment and make decisions therein.

Although GVFs are a promising approach, current understanding of their construction and use remains limited. This dissertation explores several aspects of GVF usage and representation and can be grouped into two areas of research. The first area concerns what information can be represented by GVFs. We suggest that the GVF format might be used by an agent for performing introspection or self-examination. Specifically, we propose using internally generated signals as cumulants for introspective GVFs and argue that such predictions can enhance an agent's state representation. We explore the behavior of several introspective signals in various domains. We then present a new algorithm that uses a series of GVFs to directly estimate the variance of the return---the sum of future signal. The variance of the return can be viewed as one such introspective signal capable of enhancing an agent's decision making process.

This dissertation's second focus is on improving the representations used to estimate the answers to GVFs. Value functions can be factored into two components, one representing the signal of interest and the other representing the dynamics of the environment---the successor representation. We show that in the context of a constructive GVF framework, in which new GVF targets are identified over time, using the successor representation can speed up learning of newly added targets. Next, we introduce Γ-nets, which enable a single GVF estimator to make predictions for any fixed timescale within the training bounds, improving the tractability of learning and representing vast numbers of predictions. Finally, we present investigations into how GVFs, including Γ-nets, can be used as auxiliary tasks to improve representation learning.

In summary, this dissertation provides new perspectives, algorithms and empirical evaluations that we believe will benefit the broader work in predictive approaches to artificial general intelligence. Specifically, our work on GVF representations provides a direction for future research on the topic of introspection, and the work on representing GVFs demonstrates methods for making learning and representing large collections of GVFs more tractable for real-world problems.
Subjects / Keywords
Graduation date

Fall 2020
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-8bev-ap57
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Pilarski, Patrick (Computing Science, Faculty of Rehabilitation Medicine)