Using Regret Estimation to Solve Games Compactly Open Access
- Other title
- Type of item
- Degree grantor
University of Alberta
- Author or creator
Morrill, Dustin R
- Supervisor and department
Bowling, Michael (Computing Science)
- Examining committee member and department
Szafron, Duane (Computing Science)
Hayward, Ryan (Computing Science)
Department of Computing Science
- Date accepted
- Graduation date
Master of Science
- Degree level
Game theoretic solution concepts, such as Nash equilibrium strategies that are optimal against worst case opponents, provide guidance in finding desirable autonomous agent behaviour. In particular, we wish to approximate solutions to complex, dynamic tasks, such as negotiation or bidding in auctions. Computational game theory investigates effective methods for computing such strategies. Solving human-scale games, however, is currently an intractable problem. Counterfactual Regret Minimization (CFR), is a regret-minimizing, online learning algorithm that dominates the Annual Computer Poker Competition (ACPC) and lends itself readily to various sampling and abstraction techniques. Abstract games are created to mirror the strategic elements of an original game in a more compact representation. The abstract game can be solved and the abstract game solution can be translated back into the full game. But crafting an abstract game requires domain-specific knowledge, and an abstraction can interact with the game solving process in unintuitive and harmful ways. For example, abstracting a game can create pathologies where solutions to more granular abstractions can be more exploitable against a worst-case opponent in the full game than those derived from simpler abstractions. An abstraction that could be dynamically changed and informed by the solution process could produce better solutions more consistently. We suggest that such abstractions can be largely subsumed by a regressor on game features that estimates regret during CFR. Replacing abstraction with a regressor allows the memory required to approximate a solution to a game to be proportional to the complexity of the regressor rather than the size of the game itself. Furthermore, the regressor essentially becomes a tunable, compact, and dynamic abstraction of the game that is informed by and adapts to the particular solution being computed. These properties will allow this technique to scale to previously intractable domains. We call this new algorithm Regression CFR (RCFR). In addition to showing that this approach is theoretically and practically sound, we improve RCFR by combining it with regret-matching+. Experiments involving two small poker games show that RCFR and its extension, RCFR+, show that it can approximately solve games with regressors that are drastically less complex than the game itself. In comparisons with traditional static abstractions of similar complexity, RCFR variants tend to produce less exploitable strategies.
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
- Citation for previous publication
Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, and Michael Bowling. “Solving Games with Functional Regret Estimation”. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-29, 2015, Austin Texas, USA. Austin Texas, USA, Jan. 2015, pp. 2138–2145.
- Date Uploaded
- Date Modified
- Audit Status
- Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 1250633
Last modified: 2016:06:16 16:53:38-06:00
Original checksum: 8fd5495428a762756ee5e6aaf0326287
Activity of users you follow