A general framework for reducing variance in agent evaluation

White, Martha

doi:doi:10.7939/R3CW3C

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

444 views
431 downloads

A general framework for reducing variance in agent evaluation

Author / Creator

White, Martha
In this work, we present a unified, general approach to variance reduction in agent evaluation using machine learning to minimize variance. Evaluating an agent's performance in a stochastic setting is necessary for agent development, scientific evaluation, and competitions. Traditionally, evaluation is done using Monte Carlo estimation (sample averages); the magnitude of the stochasticity in the domain or the high cost of sampling, however, can often prevent the approach from resulting in statistically significant conclusions. Recently, an advantage sum technique based on control variates has been proposed for constructing unbiased, low variance estimates of agent performance. The technique requires an expert to define a value function over states of the system, essentially a guess of the state's unknown value. In this work, we propose learning this value function from past interactions between agents in some target population. Our learned value functions have two key advantages: they can be applied in domains where no expert value function is available and they can result in tuned evaluation for a specific population of agents (e.g., novice versus advanced agents). This work has three main contributions. First, we consolidate previous work in using control variates for variance reduction into one unified, general framework and summarize the connections between this previous work. Second, our framework makes variance reduction practically possible in any sequential decision making task where designing the expert value function is time-consuming, difficult or essentially impossible. We prove the optimality of our approach and extend the theoretical understanding of advantage sum estimators. In addition, we significantly extend the applicability of advantage sum estimators and discuss practical methods for using our framework in real-world scenarios. Finally, we provide low-variance estimators for three poker domains previously without variance reduction and improve strategy selection in the expert-level University of Alberta poker bot.
Subjects / Keywords
Graduation date

Spring 2010
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3CW3C
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Schuurmans, Dale (Computing Science)
- Bowling, Michael (Computing Science)
Examining committee members and their departments
- Szafron, Duane (Computing Science)
- Hooper, Peter (Mathematical and Statistical Sciences)