Outcome Prediction and Hierarchical Models in Real-Time Strategy Games

Stanescu, Adrian M

doi:doi:10.7939/r3-rb0t-4v23

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

380 views
1324 downloads

Outcome Prediction and Hierarchical Models in Real-Time Strategy Games

Author / Creator

Stanescu, Adrian M
For many years, traditional boardgames such as Chess, Checkers or Go havebeen the standard environments to test new Artificial Intelligence (AI) algorithms for achieving robust game-playing agents capable of defeating the best human players. Presently, the focus has shifted towards games that of-fer even larger action and state spaces, such as Atari and other video games. With a unique combination of strategic thinking and fine-grained tactical com-bat management, Real-Time Strategy (RTS) games have emerged as one of the most popular and challenging research environments. Besides state space complexity, RTS properties such as simultaneous actions, partial observability and real-time computing constraints make them an excellent test bed for decision making algorithms under dynamic conditions.This thesis makes contributions towards achieving human-level AI in these complex games. Specifically, we focus on learning, using abstractions and performing adversarial search in real-time domains with extremely large action and state spaces, for which forward models might not be available.We present two abstract models for combat outcome prediction that are accurate while reasonably computationally inexpensive. These models can inform high level strategic decisions such as when to force or avoid fighting or be used as evaluation functions for look-ahead search algorithms. In both cases we obtained stronger results compared to at the time state-of-the-art heuristics. We introduce two approaches to designing adversarial look-ahead search algorithms that are based on abstractions to reduce the search complexity. Firstly, Hierarchical Adversarial Search uses multiple search layers that work at different abstraction levels to decompose the original problem. Secondly, Puppet Search methods use configurable scripts as an action abstraction mechanism and offer more design flexibility and control. Both methods show similar performance compared to top scripted and state-of-the-art search based agents in small maps, while outperforming them on larger ones. We show how to use Convolutional Neural Networks (CNNs) to effectively improve spatial awareness and evaluate game outcomes more accurately than our previous combat models. When incorporated into adversarial look-ahead search algorithms, this evaluation function increased their playing strength considerably.In these complex domains forward models might be very slow or even unavailable, which makes search methods more difficult to use. We show how policy networks can be used to mimic our Puppet Search algorithm and to bypass the need of a forward model during gameplay. We combine the much faster resulting method with other search-based tactical algorithms to produce RTS game playing agents that are stronger than state-of-the-art algorithms. We then describe how to eliminate the need for simulators or forward models entirely by using Reinforcement Learning (RL) to learn autonomous, self-improving behaviors. The resulting agents defeated the built-in AI convincingly and showed complex cooperative behaviors in small scale scenarios of a fully fledged RTS game. Finally, learning becomes more difficult when controlling increasing numbers of agents. We introduce a new approach that uses CNNs to produce a spatial decomposition mechanism and makes credit assignment from a single team reward signal more tractable. Applied to a standard Q-learning method, this approach resulted in increased performance over the original algorithm in both small and large scale scenarios.
Subjects / Keywords
Graduation date

Spring 2019
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-rb0t-4v23
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Buro, Michael (Computing Science)