This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

Search

Filter

Subject / Keyword

Show 4 more ...

Author / Creator / Contributor

1Yu, Yang

Show 1 more ...

Year

Collections

Languages

4English

Item type

Departments

Supervisors

Analysis of an Alternate Policy Gradient Estimator for Softmax Policies
Download

Spring 2022

Garg, Shivam

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
Computationally effective optimization methods for complex process control and scheduling problems
Download

Fall 2011

Yu, Yang

Over the years, how to reduce the operational cost, raise the profit and enhance the operational safety attracts tremendous interests in the chemical and petroleum industry. Since the regulatory control strategy may not achieve such rigorous requirements, higher level process control activities,...
Natural Actor - Critic Algorithms
Download

2009

Bhatnagar, Shalabh, Sutton, Richard, Ghavamzadeh, Mohammad, Lee, Mark

Technical report TR09-10. We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which...
Reinforcement Learning Algorithms for MDPs
Download

2009

Szepesvari, Csaba

Technical report TR09-13. This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference...

1 - 4 of 4

Search

Items (4)

Collections

Communities

Analysis of an Alternate Policy Gradient Estimator for Softmax Policies

Computationally effective optimization methods for complex process control and scheduling problems

Natural Actor - Critic Algorithms

Reinforcement Learning Algorithms for MDPs