Search
Skip to Search Results
Filter
Subject / Keyword
- 4Reinforcement Learning
- 2Real-Time Learning
- 1Alternate policy gradient estimator
- 1Application of Reinforcement Learning
- 1Control
- 1Deep Reinforcement Learning
Languages
Author / Creator / Contributor
Year
Collections
Item type
Departments
-
Spring 2022
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
-
Fall 2023
Off-policy policy evaluation has been a critical and challenging problem in reinforcement learning, and Temporal-Difference (TD) learning is one of the most important approaches for addressing it. There has been significant interest in searching for off-policy TD algorithms which find the same...
1 - 4 of 4