Usage
  • 64 views
  • 70 downloads

Decision Frequency Adaptation in Reinforcement Learning Using Continuous Options with Open-Loop Policies

  • Author / Creator
    Karimi, Amirmohammad
  • In classic reinforcement learning(RL) for continuous control, agents make decisions at discrete and fixed time intervals. The duration between decisions becomes a crucial hyperparameter. Setting it too short may increase the problem’s difficulty by requiring the agent to make numerous decisions to achieve its goal, while setting it too long can result in the agent losing control over the system. However, physical systems do not necessarily require a constant control frequency. For learning agents, it is often preferable to make decisions with a low frequency when possible and a high frequency when necessary. Previously, control frequency adaptation methods in temporal-abstraction RL have been proposed. However, like classic RL, these methods often do not consider physical time and treat task time steps discretely. This can make the learning experience sensitive to the underlying task interaction frequency. We propose a framework called Continuous-Time Continuous-Options (CTCO), where the agent chooses options as open-loop sub-policies of variable durations. These options are defined in continuous time and can interact with the system at any desired frequency providing smooth extended continuous actions. We demonstrate the effectiveness of CTCO by comparing its performance to classical RL and temporal-abstraction RL methods on simulated and real-world continuous control tasks with various action-cycle times. We show that our algorithm’s performance is not affected by the choice of task interaction frequency. Moreover, we show the benefit of having open-loop options over simple action repetition. Furthermore, we demonstrate the efficacy of CTCO in facilitating exploration in a real-world visual reaching task with sparse rewards for a 7 DOF robotic arm.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-75yw-7d12
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.