Search

Skip to Search Results
  • Spring 2015

    Das Gupta, Ujjwal

    Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Methods like policy gradient, that do not learn a value function and instead directly represent policy, often need fewer parameters to learn good policies....

1 - 1 of 1