Regret Minimization with Function Approximation in Extensive-Form Games

D'Orazio, Ryan

doi:doi:10.7939/r3-040j-9e84

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

811 views
887 downloads

Regret Minimization with Function Approximation in Extensive-Form Games

Author / Creator

D'Orazio, Ryan
Computing a Nash equilibrium in zero-sum games,
or more generally saddle
point optimization, is a fundamental problem
in game theory and machine learning, with applications
spanning across a wide variety of domains, from
generative modeling and computer vision to
super-human AI in imperfect information games like
poker.
Despite the broad application of Nash equilibria,
traditional methods from optimization and
machine learning are not directly applicable.
However, in zero-sum games an effective and simple
method exists -- self-play with online learning.
In this setup, an equilibrium is computed by
pitting two algorithms against each
other to play out a game repeatedly.
Online learning with self-play via Counterfactual
Regret Minimization (CFR) is the leading
approach for saddle point computation in large games
with sequential decision making and imperfect
information.
For very large games, CFR can be scaled in various
dimensions such as sampling, subgame decomposition,
and function approximation.
Despite the growing interests in scaling
algorithms with function approximation in areas
such as reinforcement learning,
current theoretical guarantees for CFR and function
approximation are minimal.
In this thesis we extend theoretical results for
CFR when using function approximation, and
complement these worst-case guarantees with experiments
on several common benchmark games with sequential
decision making and imperfect information.

The thesis is outlined as follows.
First, relevant background is given by
defining external regret -- a quantity to
evaluate online learning algorithms.
Then the connection between regret, self-play, and general
concave-convex saddle point problems is given.
A generalization of external regret
in the specific online decision problem is also reviewed.
The main theoretical contributions are then presented, generalizing
previous work to different types of regret in
the online decision problem and with different
algorithms.
The new theoretical guarantees with function approximation
give rise to two new families of algorithms,
presented as f-RCFR and f-RCFR+, combining function
approximation and CFR like algorithms.
Both f-RCFR and f-RCFR+ algorithms
are then compared across different games,
with f-RCFR+ demonstrating superior performance
and better management of function approximator capacity.
Subjects / Keywords
Graduation date

Fall 2020
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-040j-9e84
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Specialization
- Statistical Machine Learning
Supervisor / co-supervisor and their department(s)
- James R. Wright Department of Computing Science
- Matthew E. Taylor Department of Computing Science