Search
Skip to Search Results
Filter
Subject / Keyword
- 1Alternate policy gradient estimator
- 1Policy gradient methods
- 1Policy gradient methods for non-stationary environments
- 1Policy saturation
- 1Reinforcement Learning
- 1Softmax gravity well
Author / Creator / Contributor
Year
Collections
Languages
Item type
Departments
-
Spring 2022
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
1 - 1 of 1