Usage
  • 62 views
  • 79 downloads

Guarantees for Self-Play via Polymatrix Decomposability

  • Author / Creator
    MacQueen, Revan
  • Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that agents the learner will face post-training may have dramatically different behaviour than the learner came to expect by interacting with itself. For the case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that cannot lose utility from their equilibrium value against any post-training opponent; however, no such guarantee exists for multi-player games.

    We show that in games that approximately decompose into a set of two-player constant-sum games (called polymatrix games) where global $\epsilon$-Nash equilibria are boundedly far from Nash-equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded loss of utility against new agents, which we call vulnerability. In approximate subgame stable constant sum polymatrix (SS-CSP) games, the strategies produced by self-play are also exchangeable and have values that fall into a bounded range. We extend these results to extensive-form games and give an efficient representation and algorithm for such a decomposition. We demonstrate our findings through experiments on Kuhn and Leduc poker. Finally, we extend our results to games which are strategically equivalent to SS-CSP games. For the first time, our results identify a structural property of multi-player games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-def8-vy54
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.