Hindsight Rational Learning for Sequential Decision-Making: Foundations and Experimental Applications

Morrill, Dustin

doi:doi:10.7939/r3-2as1-6s79

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

221 views
520 downloads

Hindsight Rational Learning for Sequential Decision-Making: Foundations and Experimental Applications

Author / Creator

Morrill, Dustin
This thesis develops foundations for the development of dependable, scalable reinforcement learning algorithms with strong connections to game theory. I present a version of rationality for learning---one grounded in the learner's experience and connected with the rationality concepts of optimality and equilibrium---that demands resiliency to uncertainty, environmental changes, and adversarial pressures. This notion of hindsight rationality is based on regret, a well-known concept for evaluating a sequence of decisions with unilateral deviations. I show that in sequential decision-making tasks, there are many natural deviation sets with critical practical differences beyond those previously studied. I design and implement three extensions to the counterfactual regret minimization (CFR) algorithm, one that is observably sequentially hindsight rational for any given subset of deviations within a broad class; a second that generalizes regression CFR; and a third that applies to continuing Markov decision processes and robust optimization tasks.

The first part develops hindsight rationality and the partially observable history process (POHP) formalism for concisely describing multi-agent sequential decision-making from a single agent's perspective.The second part develops the foundations of defining, analyzing, and using deviations in finite-horizon POHPs to develop efficient hindsight rational algorithms, and the practical consequences of designing algorithms around different deviation sets. The third and final part describes experimental applications of these foundations that use function approximation and condensed domain representations to effectively play games and learn cautious behavior in safety challenges.
Subjects / Keywords
Graduation date

Fall 2022
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-2as1-6s79
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Bowling, Michael (Computing Science)
- Greenwald, Amy (Computer Science, Brown University)