Search

Filter

Subject / Keyword

Show 4 more ...

Author / Creator / Contributor

1Szepesvari, Csaba

Show 1 more ...

Year

Collections

Languages

4English

Item type

Departments

2Department of Computing Science

Supervisors

Analysis of an Alternate Policy Gradient Estimator for Softmax Policies
Download

Spring 2022

Garg, Shivam

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
Bootstrap Learning of Heuristic Functions
Download

Fall 2010

Jabbari Arfaee, Shahab

We investigate the use of machine learning to create effective heuristics for single-agent search. Our method aims to generate a sequence of heuristics from a given weak heuristic h{0} and a set of unlabeled training instances using a bootstrapping procedure. The training instances that can be...
Natural Actor - Critic Algorithms
Download

2009

Bhatnagar, Shalabh, Sutton, Richard, Ghavamzadeh, Mohammad, Lee, Mark

Technical report TR09-10. We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which...
Reinforcement Learning Algorithms for MDPs
Download

2009

Szepesvari, Csaba

Technical report TR09-13. This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference...

1 - 4 of 4