Usage
  • 54 views
  • 161 downloads

Effective Real-time Reinforcement Learning for Vision-Based Robotic Tasks

  • Author / Creator
    Wang, Yan
  • Vision is one of the essential means for humans to perceive the world.
    Similarly, today's intelligent robot agents rely on camera images to perform complex tasks in the real world.
    Due to the ever-changing nature of the real world, intelligent robot agents must continually learn from high-dimensional images to adapt to new environments.
    Such capabilities entail learning from images on-the-fly as they interact with the environments, which we call vision-based real-time learning.

    Recently we have seen many successful applications of Reinforcement Learning (RL).
    It is natural to extend the scope of RL to vision-based real-time learning of robotic control tasks.
    However, a vision-based real-time robotic RL agent faces some practical issues oft-ignored in conventional RL research.
    The first issue is that robots deployed in the real world are usually tethered to a resource-limited computer, while vision-based RL algorithms are expensive.
    A prominent difference between real-time RL in the real world and conventional RL is that the time in the real world does not pause while the agent computes actions and updates policies.
    Given such a setup, it is unclear to what extent the performance of a learning system will be affected by resource limitations.
    Fortunately, in most cases, a powerful workstation can be wirelessly connected to the robot to provide extra computation resources.
    However, there is no systematic study of efficiently using the wirelessly connected powerful computer to compensate for performance loss.
    To shed some light on this issue, we propose and implement a real-time learning system called the Remote-Local Distributed (ReLoD) system to distribute computations of two deep reinforcement learning (RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), between a local computer and a remote computer.
    The performance is evaluated on two vision-based robot tasks developed using a robotic arm and a mobile robot.
    Our results show that SAC's performance degrades heavily on a resource-limited computer.
    Strikingly, distributing all computations of SAC on a wirelessly connected workstation fails to improve performance.
    However, a carefully chosen distribution consistently and substantially improves performance on both tasks.
    On the other hand, the performance of PPO remains largely unaffected by the distribution of computations.
    In other words, without careful consideration, using a powerful remote computer may not improve performance.

    The second issue a real-time robotic RL agent faces is that designing dense rewards for vision-based real-robot tasks requires hand-engineering or pre-training, which can be unsuitable for unforeseen tasks.
    When formulating a real-world robotic task as a reinforcement learning (RL) task, it is crucial to determine a reward function that is convenient to specify, accurately captures the intended problem, and facilitates the agent with learning.
    Many designers of real-world robot tasks use domain knowledge to design informative dense rewards to facilitate training.
    However, designing task-dependent reward functions for real-time learning tasks, including non-vision- and vision-based tasks, are difficult since the domain knowledge is generally unavailable for non-stationary and unforeseen environments.
    Moreover, hand-crafting a dense reward function for vision-based tasks is more problematic due to the need for effective image encoders, which are generally unavailable prior.
    For so-called goal-reaching tasks, there is a simple way of designing the reward function independent of domain knowledge and prior image encoders but still aligning well with our intention: giving a -1 reward every time step.
    Goal-reaching tasks are formulated as episodic RL tasks with termination upon reaching the terminal state as soon as possible.
    We call them minimum-time tasks or vision-based minimum-time tasks if images represent terminal states since maximizing the undiscounted sum of these -1s leads to reaching the terminal state as soon as possible.
    Unfortunately, minimum-time tasks are usually avoided in practice, as they are considered difficult and uninformative for learning.
    In this thesis, we demonstrate that non-vision and vision-based minimum-time tasks can be learned quickly from scratch.
    We also provide guidelines that practitioners can use to predict if the minimum-time task formulation is appropriate for their problems based on the performance of the initial policy.
    Following our guidelines on minimum-time tasks, we first demonstrate using a single reinforcement learning system to achieve real-time learning of pixel-based control for several different kinds of real robots from scratch.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-4ame-w625
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.