Evaluating AlphaZero in a Strongly Solved Game

Karki, Bigyan

doi:doi:10.7939/r3-ab8h-xe50

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

351 views
560 downloads

Evaluating AlphaZero in a Strongly Solved Game

Author / Creator

Karki, Bigyan
Chinese Checkers, a traditional game played on a star-shaped board by 2-6 players, has been a domain for game AI research and has been strongly solved up to a 6×6 board with 6 pieces per player in a two-player game. In this work, we apply the AlphaZero algorithm, known for its success in perfect information, two-player deterministic games like Chess, Shogi, and Go, to Chinese Checkers. Our implementation involved training a custom AlphaZero agent on a 4 × 4 board with 3 pieces and a 5 × 5 board with 6 pieces. The primary contributions include a parallelized version of AlphaZero, an evaluation of AlphaZero in perfect information games, an exploration of the learning data structure, an assessment of learning accuracy on training data, and measurements of generalization on states both similar and random to those observed. While AlphaZero agents have achieved superhuman performance in certain domains, recent analysis on KataGo revealed vulnerabilities to certain strategies that human players would not fall for, indicating potential performance gaps in AlphaZero. Our work studies the nature of training and evaluating agents in simplified variants of Chinese Checkers, identifying a decrease in the accuracy of AlphaZero’s learned policy on states outside the training set. Even in smaller variants of Chinese Checkers, adversarial policies were able to leverage these shortcomings, leading to policy mistakes by AlphaZero agent during play. We propose a combination of supervised and self-play training to alleviate these exploitations, aiming to enhance the AlphaZero agent’s resilience against adversarial strategies.
Subjects / Keywords
Graduation date

Spring 2024
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-ab8h-xe50
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Sturtevant, Nathan