Search
Skip to Search Results- 2Approximate dynamic programming
- 2Function approximation
- 2Policy gradient methods
- 2Temporal difference learning
- 2Two-timescale stochastic approximation
- 1Active learning
- 1Bhatnagar, Shalabh
- 1Garg, Shivam
- 1Ghavamzadeh, Mohammad
- 1Lee, Mark
- 1Sutton, Richard
- 1Szepesvari, Csaba
-
Spring 2022
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from a bad policy initialization or a...
-
Computationally effective optimization methods for complex process control and scheduling problems
DownloadFall 2011
Over the years, how to reduce the operational cost, raise the profit and enhance the operational safety attracts tremendous interests in the chemical and petroleum industry. Since the regulatory control strategy may not achieve such rigorous requirements, higher level process control activities,...
-
2009
Bhatnagar, Shalabh, Sutton, Richard, Ghavamzadeh, Mohammad, Lee, Mark
Technical report TR09-10. We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which...