Solving Common-Payoff Games with Approximate Policy Iteration

Sokota, Samuel

doi:doi:10.7939/r3-4gp9-jc31

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

670 views
880 downloads

Solving Common-Payoff Games with Approximate Policy Iteration

Author / Creator

Sokota, Samuel
For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov decision processes (Dec-POMDPs), a standard formalism for modeling these interactions, is a NEXP-complete problem. While there has been significant progress in developing strong teams of agents over the last decade, this progress has been fragmented among the engineering community, the Dec-POMDP community, and the deep multi-agent reinforcement learning (MARL) community. This thesis begins by reviewing the literature of each of these communities in unified language and formalizes a connection between recent developments exploiting common knowledge in common-payoff games and two-player zero-sum games. It then experimentally compares a select group of never-before-compared successful training paradigms. Lastly, and most significantly, it identifies and fills an algorithmic gap existing between the engineering and Dec-POMDP communities, which have proposed algorithms that are asymptotically optimal but difficult to scale, and the deep MARL community, which has proposed algorithms that, while scalable, have difficulty solving even small games, by proposing cooperative approximate policy iteration (CAPI), a novel algorithm for computing joint policies in common-payoff games. Experiments demonstrate that CAPI is capable of solving games that are orders of magnitudes larger than those that have been solved in existing literature.
Subjects / Keywords
Graduation date

Fall 2020
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-4gp9-jc31
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- White, Martha (Computing Science)
- Lanctot, Marc (NA)