Linear Least-squares Dyna-style Planning

Yao, Hengshuai

doi:doi:10.7939/R3QB9V881

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Technical Reports (Computing Science)

Usage

288 views
465 downloads

Linear Least-squares Dyna-style Planning

Author(s) / Creator(s)
- Yao, Hengshuai
Technical report TR11-04. World model is very important for model-based reinforcement learning. For example, a model is frequently used in Dyna: in learning steps to select actions and in planning steps to project sampled states or features. In this paper we propose least-squares Dyna (LS-Dyna) algorithm to improve the accuracy of the world model and provide better planning. LS-Dyna is a special Dyna architecture in that it estimates the world model by a least-squares method. LS-Dyna is more data efficient, yet it has the same complexity with existing linear Dyna that is based on gradient descent estimation of the world model. Furthermore, the least-squres modeling is computed in an online recursive fashion and does not have to record historical experience or tune a step-size. Experimental results on a 98-state Boyan chain example and a Mountain-car problem show that LS-Dyna performs significantly better than TD/Q-learning and the gradient-descent linear Dyna algorithm. | TRID-ID TR11-04
Date created

2011
Subjects / Keywords
Type of Item

Report
DOI

https://doi.org/10.7939/R3QB9V881
License

Attribution 3.0 International

Language
- English