Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots

Yuan, Yufeng

doi:doi:10.7939/r3-2b10-d658

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

298 views
541 downloads

Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots

Author / Creator

Yuan, Yufeng
An oft-ignored challenge of real-world reinforcement learning is that, unlike standard simulated environments, the real world does not pause when agents make learning updates. As standard simulated environments do not address this real-time aspect of learning, most available implementations of deep rein- forcement learning algorithms process environment interactions and learning updates sequentially. Consequently, when such implementations are deployed in the real world, they may not act responsively and learn efficiently. Asyn- chronous learning has been proposed to solve this issue, but no systematic comparison between sequential and asynchronous reinforcement learning was conducted using real-world environments. In this thesis, we set up two vision- based tasks with a robotic arm, implement an asynchronous learning sys- tem that extends a previous architecture, and compare sequential and asyn- chronous reinforcement learning across different action cycle times, sensory data dimensions, and mini-batch sizes. Our experiments show that when the time cost of learning updates increases, the action cycle time in sequential implementation could grow excessively long, while the asynchronous imple- mentation can always maintain a fixed and appropriate action cycle time. Consequently, when learning updates are expensive, the performance of se- quential learning diminishes and is outperformed by a substantial margin by asynchronous learning. Our system learns in real-time to reach and track vi- sual targets from pixels within two hours of experience and does so directly
using real robots, learning completely from scratch.
Subjects / Keywords
Graduation date

Fall 2021
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-2b10-d658
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Mahmood, A. Rupam (Computing Science)