Agent-State Construction with Auxiliary Inputs

Tao, Ruo Yu

doi:doi:10.7939/r3-qtfj-gz10

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

134 views
173 downloads

Agent-State Construction with Auxiliary Inputs

Author / Creator

Tao, Ruo Yu
In most, if not every, realistic sequential decision-making tasks, the decision-making agent is not able to model the full complexity of the world. In reinforcement learning, the environment is often much larger and more complex than the agent, a setting also known as partial observability. In such settings, the agent must leverage more than just the current sensory inputs; it must construct an agent state that summarizes the agent’s previous interactions with the world. Currently, the most common approach to tackle such a problem is to learn the agent-state function with a recurrent network. This is done with the agent’s sensory stream as input, which is often augmented with transformations of the agent’s observation. These augmentations are done in multiple ways, from simple approaches like concatenating observations to more complex ones such as uncertainty estimates or predictive representations. Nevertheless, although ubiquitous in the field, these additional inputs, which we term auxiliary inputs, are rarely emphasized, and it is not clear what their role or impact is. In this work we formalize agent-state construction with auxiliary inputs and present several examples of auxiliary inputs that incorporate information from the past, present, and/or future of the agent-environment interaction. We show that auxiliary inputs allow an agent to discriminate between observations that would otherwise be aliased, leading to more expressive features that smoothly interpolate between different states. We empirically evaluate this agent-state construction with different function approximators, using different instantiations of these auxiliary inputs across a variety of tasks. This approach is complementary to state-of-the-art methods such as recurrent neural networks, and acts as a heuristic that facilitates longer temporal credit assignment, reducing the number of time steps needed when performing truncated backpropagation through time and leading to better performance.
Subjects / Keywords
Graduation date

Fall 2022
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-qtfj-gz10
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- White, Adam (Computing Science)
- Machado, Marlos C. (Computing Science)