Usage
  • 30 views
  • 22 downloads

Advances in Distributional Reinforcement Learning: Bridging Theory with Algorithmic Practice

  • Author / Creator
    Sun, Ke
  • This thesis comprehensively investigates Distributional Reinforcement Learning~(RL), a vibrant research field that interplays between statistics and RL. As an extension of classical RL, distributional RL, on the one hand, embraces plenty of statistical ideas by incorporating distributional learning, including density estimation and distribution divergence. At the same time, distributional RL involves frontier issues within the realm of RL, such as exploration, optimization, and uncertainty. In this thesis, we examine the benefits of being distributional in the context of RL by exploring the resulting theoretical advantages and properties, including regularization, optimization, and robustness against training noises. This investigation finally motivates the design of novel distributional RL algorithms.

    In the first paper, we delve into the benefits of being categorical distributional in RL from the perspective of regularization. We attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique. This unexplored regularization in the distributional RL context is aimed at capturing additional return distribution knowledge regardless of only its expectation, contributing to an augmented reward signal in policy optimization. In the second paper, we further provide evidence of the benefits of distributional RL through the optimization lens. We demonstrate that the distribution loss of distributional RL has desirable smoothness characteristics and hence enjoys stable gradients.
    Furthermore, we show that distributional RL can perform favorably if the return distribution approximation is appropriate, measured by the variance of gradient estimates in each environment. In the third paper, we study the training robustness of distributional RL by validating the contraction of distributional Bellman operators in the proposed State-Noisy Markov Decision Process~(SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we theoretically characterize the bounded gradient norm of distributional RL loss in terms of the state features, which interprets its better training robustness against state observation noises. In the last paper, we propose a novel distributional RL algorithm, called \textit{Sinkhorn distributional RL~(SinkhornDRL)}, which leverages Sinkhorn divergence—a regularized Wasserstein loss—to minimize the difference between current and target Bellman return distributions. Theoretically, we prove the contraction properties of SinkhornDRL, aligning with the interpolation nature of Sinkhorn divergence between Wasserstein distance and Maximum Mean Discrepancy~(MMD).

    In summary, these papers contribute to the theoretical understanding of the benefits of being fully distributional in RL compared with classical RL, which only focuses on the expectation of the return distribution. Along with our algorithm design, our work not only provides sufficient insights to guild practitioners for deploying distributional RL in real applications but also contributes to inspiring researchers from other relevant areas broadly in statistics, machine learning, operational research, and control.

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-a8na-xa12
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.