This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.
Search
Skip to Search Results
Filter
Author / Creator / Contributor
Supervisors
Subject / Keyword
Year
Collections
Languages
Item type
Departments
-
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
DownloadFall 2020
Policy gradient methods typically estimate both explicit policy and value functions. The long-extant view of policy gradient methods as approximate policy iteration---alternating between policy evaluation and policy improvement by greedification---is a helpful framework to elucidate algorithmic...
1 - 1 of 1