Search

Skip to Search Results
  • Fall 2023

    He, Jiamin

    Off-policy policy evaluation has been a critical and challenging problem in reinforcement learning, and Temporal-Difference (TD) learning is one of the most important approaches for addressing it. There has been significant interest in searching for off-policy TD algorithms which find the same...

1 - 1 of 1