Usage
  • 20 views
  • 27 downloads

Evaluating AlphaZero in a Strongly Solved Game

  • Author / Creator
    Karki, Bigyan
  • Chinese Checkers, a traditional game played on a star-shaped board by 2-6 players, has been a domain for game AI research and has been strongly solved up to a 6×6 board with 6 pieces per player in a two-player game. In this work, we apply the AlphaZero algorithm, known for its success in perfect information, two-player deterministic games like Chess, Shogi, and Go, to Chinese Checkers. Our implementation involved training a custom AlphaZero agent on a 4 × 4 board with 3 pieces and a 5 × 5 board with 6 pieces. The primary contributions include a parallelized version of AlphaZero, an evaluation of AlphaZero in perfect information games, an exploration of the learning data structure, an assessment of learning accuracy on training data, and measurements of generalization on states both similar and random to those observed. While AlphaZero agents have achieved superhuman performance in certain domains, recent analysis on KataGo revealed vulnerabilities to certain strategies that human players would not fall for, indicating potential performance gaps in AlphaZero. Our work studies the nature of training and evaluating agents in simplified variants of Chinese Checkers, identifying a decrease in the accuracy of AlphaZero’s learned policy on states outside the training set. Even in smaller variants of Chinese Checkers, adversarial policies were able to leverage these shortcomings, leading to policy mistakes by AlphaZero agent during play. We propose a combination of supervised and self-play training to alleviate these exploitations, aiming to enhance the AlphaZero agent’s resilience against adversarial strategies.

  • Subjects / Keywords
  • Graduation date
    Spring 2024
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-ab8h-xe50
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.