Download the full-sized PDF of A general framework for reducing variance in agent evaluationDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

A general framework for reducing variance in agent evaluation Open Access


Other title
agent evaluation
variance reduction
machine learning
Type of item
Degree grantor
University of Alberta
Author or creator
White, Martha
Supervisor and department
Bowling, Michael (Computing Science)
Schuurmans, Dale (Computing Science)
Examining committee member and department
Szafron, Duane (Computing Science)
Hooper, Peter (Mathematical and Statistical Sciences)
Department of Computing Science

Date accepted
Graduation date
Master of Science
Degree level
In this work, we present a unified, general approach to variance reduction in agent evaluation using machine learning to minimize variance. Evaluating an agent's performance in a stochastic setting is necessary for agent development, scientific evaluation, and competitions. Traditionally, evaluation is done using Monte Carlo estimation (sample averages); the magnitude of the stochasticity in the domain or the high cost of sampling, however, can often prevent the approach from resulting in statistically significant conclusions. Recently, an advantage sum technique based on control variates has been proposed for constructing unbiased, low variance estimates of agent performance. The technique requires an expert to define a value function over states of the system, essentially a guess of the state's unknown value. In this work, we propose learning this value function from past interactions between agents in some target population. Our learned value functions have two key advantages: they can be applied in domains where no expert value function is available and they can result in tuned evaluation for a specific population of agents (e.g., novice versus advanced agents). This work has three main contributions. First, we consolidate previous work in using control variates for variance reduction into one unified, general framework and summarize the connections between this previous work. Second, our framework makes variance reduction practically possible in any sequential decision making task where designing the expert value function is time-consuming, difficult or essentially impossible. We prove the optimality of our approach and extend the theoretical understanding of advantage sum estimators. In addition, we significantly extend the applicability of advantage sum estimators and discuss practical methods for using our framework in real-world scenarios. Finally, we provide low-variance estimators for three poker domains previously without variance reduction and improve strategy selection in the expert-level University of Alberta poker bot.
License granted by Martha White ( on 2010-01-03T20:50:44Z (GMT): Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of the above terms. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 392540
Last modified: 2015:10:12 20:37:15-06:00
Filename: martha_white_msc_thesis.pdf
Original checksum: a7b74b99328f13cf43d626a1fb9652c2
Well formed: false
Valid: false
Status message: Lexical error offset=386124
Page count: 63
Activity of users you follow
User Activity Date