Using Regret Estimation to Solve Games Compactly

Morrill, Dustin R

doi:doi:10.7939/R3NZ80Z2Z

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

421 views
368 downloads

Using Regret Estimation to Solve Games Compactly

Author / Creator

Morrill, Dustin R
Game theoretic solution concepts, such as Nash equilibrium strategies that are optimal against worst case opponents, provide guidance in finding desirable autonomous agent behaviour. In particular, we wish to approximate solutions to complex, dynamic tasks, such as negotiation or bidding in auctions. Computational game theory investigates effective methods for computing such strategies. Solving human-scale games, however, is currently an intractable problem. Counterfactual Regret Minimization (CFR), is a regret-minimizing, online learning algorithm that dominates the Annual Computer Poker Competition (ACPC) and lends itself readily to various sampling and abstraction techniques. Abstract games are created to mirror the strategic elements of an original game in a more compact representation. The abstract game can be solved and the abstract game solution can be translated back into the full game. But crafting an abstract game requires domain-specific knowledge, and an abstraction can interact with the game solving process in unintuitive and harmful ways. For example, abstracting a game can create pathologies where solutions to more granular abstractions can be more exploitable against a worst-case opponent in the full game than those derived from simpler abstractions. An abstraction that could be dynamically changed and informed by the solution process could produce better solutions more consistently. We suggest that such abstractions can be largely subsumed by a regressor on game features that estimates regret during CFR. Replacing abstraction with a regressor allows the memory required to approximate a solution to a game to be proportional to the complexity of the regressor rather than the size of the game itself. Furthermore, the regressor essentially becomes a tunable, compact, and dynamic abstraction of the game that is informed by and adapts to the particular solution being computed. These properties will allow this technique to scale to previously intractable domains. We call this new algorithm Regression CFR (RCFR). In addition to showing that this approach is theoretically and practically sound, we improve RCFR by combining it with regret-matching+. Experiments involving two small poker games show that RCFR and its extension, RCFR+, show that it can approximately solve games with regressors that are drastically less complex than the game itself. In comparisons with traditional static abstractions of similar complexity, RCFR variants tend to produce less exploitable strategies.
Subjects / Keywords
Graduation date

Spring 2016
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3NZ80Z2Z
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Bowling, Michael (Computing Science)
Examining committee members and their departments
- Szafron, Duane (Computing Science)
- Hayward, Ryan (Computing Science)