Reinforcement Learning on Resource Bounded Systems

Travnik, Jaden

doi:doi:10.7939/R39G5GV5S

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

249 views
497 downloads

Reinforcement Learning on Resource Bounded Systems

Author / Creator

Travnik, Jaden
Recent advancements in reinforcement learning have made the field interesting to academia and industry alike. Many of these advancements depend on deep learning as a means to approximate a value function or a policy. This dependency usually relies on high performance hardware (e.g., a graphics processing unit, GPU) and applications of deep learning are often limited to domains where a substantial amount of time is allowed for a prediction to be made or an action to be chosen. Although these criteria cover many important use cases, the growing popularity of the ``Internet of Things,'' wearable electronics, and the advancement of myoelectric prosthetic limbs presents a rapidly growing real-time domain of resource bounded systems that are not properly suited for deep learning yet could still benefit from the application of reinforcement learning. Furthermore, many of these systems are limited by the physical space they must occupy. This restricts the size of the hardware and thereby the computational and memory resources that can be used. Despite these restrictions, demand for prompt actions from receptive systems continues to grow. To address these problems, I first highlight the difficulties one is faced with when implementing reinforcement learning on a system which is deployed in an asynchronous environment and introduce a new metric of performance by measuring the time it takes for a system to react to an observed state of an asynchronous environment. Secondly, I develop a class of algorithms that addresses these issues by reordering the algorithmic components to minimize reaction time. Thirdly, to minimize both the time and memory necessary to compute function approximation, I introduce a novel linear function approximation method, selective Kanerva coding (SKC), that allows a reinforcement learning agent to perform behaviors reactively in real-time while using less memory and computation time than the standard linear approach of tile coding. I also show that SKC is less sensitive to the curse of dimensionality than tile coding making SKC a significant step towards accurately representing high dimensional data on resource bounded systems. Moreover, I show that SKC can make the inclusion of more sensory modalities more feasible, which can increase prediction accuracy when those modes of sensation are relevant to the prediction. Finally, I present an exploration of the meta-parameters of SKC and evaluate the performance of two different variations of SKC against the original formulation. These findings are imperative to the current state of the field of reinforcement learning as they form a challenging perspective that is contrary to the current direction of the field's focus on deep learning. I form this argument by emphasizing the impracticality of deep learning in domains of resource bounded systems deployed in real-time environments, establishing the limitations on available computation and memory of these systems, and address these issues by proposing new insights, algorithms, and representations.
Subjects / Keywords
Graduation date

Spring 2018
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R39G5GV5S
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Pilarski, Patrick (Computing Science)