Improving Different Aspects in RL - Accelerating Convergence Rate & Enhancing Safety and Robustness

Gao, Yue

doi:doi:10.7939/r3-1sxj-1148

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

202 views
357 downloads

Improving Different Aspects in RL - Accelerating Convergence Rate & Enhancing Safety and Robustness

Author / Creator

Gao, Yue
Reinforcement learning (RL) has moved from toy domains to real-world applications, while each of these applications has inherent difficulties which are long-standing challenges in RL, such as: stucking at plateaus, limited training time, costly exploration and safety considerations. I, with my collaborates proposed several RL algorithms to improve different aspects of the performance including \textbf{geometry-aware gradient descent (GNGD)}, a policy gradient method (which is also applicable to other non-convex optimizations) which is powerful in terms of theoretical convergence result; and \textbf{a family of Q-learning algorithms} enhancing risk-aversion and robustness empirically in trading market.

Not only in RL, \textbf{geometry-aware descent methods} could also be applied in any first-order non-uniform optimization and can
converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds.

e.g, for its application to PG and GLM,
it can be shown that normalizing the gradient ascent method
can accelerate convergence to $O(e^{-t})$
while incurring less overhead than existing algorithms, which significantly improves the best known results. It can also be shown that the proposed geometry-aware descent methods
escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.

On the empirical side of RL, for the purpose of enhancing robustness and reducing risk, a family of Q-learning algorithm were proposed by taking characteristics such as \emph{risk-awareness}, \emph{robustness to perturbations} and \emph{low learning variance} as building blocks, and they perform well in trading market and balance theoretical guarantees with practical use.
Subjects / Keywords
Graduation date

Fall 2021
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-1sxj-1148
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Csaba Szepesvari (Computing Science)