This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

Search

Filter

Subject / Keyword

Show 4 more ...

Collections

Author / Creator / Contributor

Year

Languages

2English

Item type

2Thesis

Departments

2Department of Computing Science

Supervisors

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Download

Fall 2020

Chan, Alan

Policy gradient methods typically estimate both explicit policy and value functions. The long-extant view of policy gradient methods as approximate policy iteration---alternating between policy evaluation and policy improvement by greedification---is a helpful framework to elucidate algorithmic...
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
Download

Fall 2022

Neumann,Samuel

Actor-Critics are a popular class of algorithms for control. Their ability to learn complex behaviours in continuous-action environments make them directly applicable to many real-world scenarios. These algorithms are composed of two parts - a critic and an actor. The critic learns to critique...

1 - 2 of 2

Search

Items (2)

Collections

Communities

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement