Usage
  • 116 views
  • 375 downloads

Hindsight Rational Learning for Sequential Decision-Making: Foundations and Experimental Applications

  • Author / Creator
    Morrill, Dustin
  • This thesis develops foundations for the development of dependable, scalable reinforcement learning algorithms with strong connections to game theory. I present a version of rationality for learning---one grounded in the learner's experience and connected with the rationality concepts of optimality and equilibrium---that demands resiliency to uncertainty, environmental changes, and adversarial pressures. This notion of hindsight rationality is based on regret, a well-known concept for evaluating a sequence of decisions with unilateral deviations. I show that in sequential decision-making tasks, there are many natural deviation sets with critical practical differences beyond those previously studied. I design and implement three extensions to the counterfactual regret minimization (CFR) algorithm, one that is observably sequentially hindsight rational for any given subset of deviations within a broad class; a second that generalizes regression CFR; and a third that applies to continuing Markov decision processes and robust optimization tasks.

    The first part develops hindsight rationality and the partially observable history process (POHP) formalism for concisely describing multi-agent sequential decision-making from a single agent's perspective.The second part develops the foundations of defining, analyzing, and using deviations in finite-horizon POHPs to develop efficient hindsight rational algorithms, and the practical consequences of designing algorithms around different deviation sets. The third and final part describes experimental applications of these foundations that use function approximation and condensed domain representations to effectively play games and learn cautious behavior in safety challenges.

  • Subjects / Keywords
  • Graduation date
    Fall 2022
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-2as1-6s79
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.