Search
Skip to Search Results
Filter
Author / Creator / Contributor
Departments
Languages
Supervisors
Subject / Keyword
Year
Collections
Item type
-
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
DownloadFall 2020
Policy gradient methods typically estimate both explicit policy and value functions. The long-extant view of policy gradient methods as approximate policy iteration---alternating between policy evaluation and policy improvement by greedification---is a helpful framework to elucidate algorithmic...
1 - 1 of 1