Single-Agent Optimization with Monte-Carlo Tree Search and Deep Reinforcement Learning

Seify, Arta

doi:doi:10.7939/r3-ydsv-6d91

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

454 views
1266 downloads

Single-Agent Optimization with Monte-Carlo Tree Search and Deep Reinforcement Learning

Author / Creator

Seify, Arta
Single-agent optimization tasks, also referred to as single-player games, include any domain with an agent whose goal is to maximize an objective function(s), without interference from any other agents. Such tasks have been studied for decades. For example, in 2006, NASA automated the design of antennas by framing the problem as a single-agent optimization task.

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in zero-sum two-player perfect-information games. In this thesis, my goal is to bring the success of these algorithms to single-player games. I begin by introducing a variant of MCTS that is suitable for games where the bounds on rewards is not known, which is the case in many optimization problems. My enhancements include using a general action-value normalization technique, as well as a virtual loss function, which enables effective search parallelization. I then introduce Policy-MCTS, which uses a deep policy network trained by generations of self-play to guide the search. Lastly, I provide initial work on using value estimations to entirely replace the rollouts of MCTS.

I gauge the effectiveness of my methods in SameGame, an NP-hard single-player test domain. I demonstrate that Policy-MCTS is competitive with state-of-the-art search-based methods on a benchmark set of positions.
Subjects / Keywords
Graduation date

Spring 2020
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-ydsv-6d91
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Buro, Michael (Computing Science)