ERA users may experience an intermittent Deposit/Save error. We apologize for any inconvenience this may cause. Thank you for your patience while we work to resolve the issue. When the work is completed, we'll remove this notice.
Technical Reports (Computing Science)
Technical Reports Collection
Items in this Collection
- 9Bowling, Michael
- 4Schuurmans, Dale
- 4Wang, Tao
- 4Zinkevich, Martin
- 3Lizotte, Daniel
- 2Johanson, Michael
Technical report TR06-26. We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit...
Online learning aims to perform nearly as well as the best hypothesis in hindsight. For some hypothesis classes, though, even finding the best hypothesis offline is challenging. In such offline cases, local search techniques are often employed and only local optimality guaranteed. For online...
Technical report TR07-14. Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for...
Technical report TR07-15. Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose...
Technical report TR08-16. We propose a dual approach to dynamic programming and reinforcement learning based on maintaining an explicit representation of visit distributions as opposed to value functions. An advantage of working in the dual is that it allows one to exploit techniques for...
Technical report TR07-10. We propose to use a new dual approach to dynamic programming. The idea is to maintain an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed...
Technical report TR09-15. Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the...
Technical report TR07-05. We investigate novel, dual algorithms for dynamic programming and reinforcement learning, based on maintaining explicit representations of stationary distributions instead of value functions. In particular, we investigate the convergence properties of standard dynamic...
Technical report TR04-11. Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to...