Guarantees for Self-Play via Polymatrix Decomposability

MacQueen, Revan

doi:doi:10.7939/r3-def8-vy54

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

153 views
171 downloads

Guarantees for Self-Play via Polymatrix Decomposability

Author / Creator

MacQueen, Revan
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that agents the learner will face post-training may have dramatically different behaviour than the learner came to expect by interacting with itself. For the case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that cannot lose utility from their equilibrium value against any post-training opponent; however, no such guarantee exists for multi-player games.

We show that in games that approximately decompose into a set of two-player constant-sum games (called polymatrix games) where global $\epsilon$-Nash equilibria are boundedly far from Nash-equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded loss of utility against new agents, which we call vulnerability. In approximate subgame stable constant sum polymatrix (SS-CSP) games, the strategies produced by self-play are also exchangeable and have values that fall into a bounded range. We extend these results to extensive-form games and give an efficient representation and algorithm for such a decomposition. We demonstrate our findings through experiments on Kuhn and Leduc poker. Finally, we extend our results to games which are strategically equivalent to SS-CSP games. For the first time, our results identify a structural property of multi-player games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.
Subjects / Keywords
Graduation date

Fall 2023
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-def8-vy54
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Wright, James (Computing Science)