Usage
  • 666 views
  • 737 downloads

Regret Minimization with Function Approximation in Extensive-Form Games

  • Author / Creator
    D'Orazio, Ryan
  • Computing a Nash equilibrium in zero-sum games,
    or more generally saddle
    point optimization, is a fundamental problem
    in game theory and machine learning, with applications
    spanning across a wide variety of domains, from
    generative modeling and computer vision to
    super-human AI in imperfect information games like
    poker.
    Despite the broad application of Nash equilibria,
    traditional methods from optimization and
    machine learning are not directly applicable.
    However, in zero-sum games an effective and simple
    method exists -- self-play with online learning.
    In this setup, an equilibrium is computed by
    pitting two algorithms against each
    other to play out a game repeatedly.
    Online learning with self-play via Counterfactual
    Regret Minimization (CFR) is the leading
    approach for saddle point computation in large games
    with sequential decision making and imperfect
    information.
    For very large games, CFR can be scaled in various
    dimensions such as sampling, subgame decomposition,
    and function approximation.
    Despite the growing interests in scaling
    algorithms with function approximation in areas
    such as reinforcement learning,
    current theoretical guarantees for CFR and function
    approximation are minimal.
    In this thesis we extend theoretical results for
    CFR when using function approximation, and
    complement these worst-case guarantees with experiments
    on several common benchmark games with sequential
    decision making and imperfect information.

    The thesis is outlined as follows.
    First, relevant background is given by
    defining external regret -- a quantity to
    evaluate online learning algorithms.
    Then the connection between regret, self-play, and general
    concave-convex saddle point problems is given.
    A generalization of external regret
    in the specific online decision problem is also reviewed.
    The main theoretical contributions are then presented, generalizing
    previous work to different types of regret in
    the online decision problem and with different
    algorithms.
    The new theoretical guarantees with function approximation
    give rise to two new families of algorithms,
    presented as f-RCFR and f-RCFR+, combining function
    approximation and CFR like algorithms.
    Both f-RCFR and f-RCFR+ algorithms
    are then compared across different games,
    with f-RCFR+ demonstrating superior performance
    and better management of function approximator capacity.

  • Subjects / Keywords
  • Graduation date
    Fall 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-040j-9e84
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.