This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

Search

Filter

Author / Creator / Contributor

1Chan, Alan

Supervisors

1White, Martha (Computing Science)

Subject / Keyword

1artificial intelligence
1policy gradient
1reinforcement learning

Year

Collections

1Graduate and Postdoctoral Studies (GPS), Faculty of
1Graduate and Postdoctoral Studies (GPS), Faculty of/Theses and Dissertations

Languages

1English

Item type

1Thesis

Departments

1Department of Computing Science

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Download

Fall 2020

Chan, Alan

Policy gradient methods typically estimate both explicit policy and value functions. The long-extant view of policy gradient methods as approximate policy iteration---alternating between policy evaluation and policy improvement by greedification---is a helpful framework to elucidate algorithmic...

1 - 1 of 1

Search

Items (1)

Collections

Communities

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences