Linear Least-squares Dyna-style Planning

  • Author(s) / Creator(s)
  • Technical report TR11-04. World model is very important for model-based reinforcement learning. For example, a model is frequently used in Dyna: in learning steps to select actions and in planning steps to project sampled states or features. In this paper we propose least-squares Dyna (LS-Dyna) algorithm to improve the accuracy of the world model and provide better planning. LS-Dyna is a special Dyna architecture in that it estimates the world model by a least-squares method. LS-Dyna is more data efficient, yet it has the same complexity with existing linear Dyna that is based on gradient descent estimation of the world model. Furthermore, the least-squres modeling is computed in an online recursive fashion and does not have to record historical experience or tune a step-size. Experimental results on a 98-state Boyan chain example and a Mountain-car problem show that LS-Dyna performs significantly better than TD/Q-learning and the gradient-descent linear Dyna algorithm. | TRID-ID TR11-04

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International