Download the full-sized PDF
Permanent link (DOI): https://doi.org/10.7939/R3FF3M75H
This file is in the following communities:
|Graduate Studies and Research, Faculty of|
This file is in the following collections:
|Theses and Dissertations|
DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE Open Access
- Other title
- Type of item
- Degree grantor
University of Alberta
- Author or creator
White, Adam, M
- Supervisor and department
Richard S. Sutton (Computing Science)
- Examining committee member and department
Marek Reformat (Electrical and Computer Engineering)
Pierre Boulanger (Computing Science)
Pierre-Yves Oudeyer (French National Institute for computer science and applied mathematics, Bordeaux France)
Michael Bowling (Computing Science)
Department of Computing Science
- Date accepted
- Graduation date
Doctor of Philosophy
- Degree level
Understanding how an artificial agent may represent, acquire, update, and use large amounts of knowledge has long been an important research challenge in artificial intelligence. The quantity of knowledge, or knowing a lot, may be nicely thought of as making and updat- ing many predictions about many different courses of action. This predictive approach to knowledge ensures the knowledge is grounded in and learned from low-level data generated by an autonomous agent interacting with the world. Because predictive knowledge can be maintained without human intervention, its acquisition can potentially scale with available data and computing resources. The idea that knowledge might be expressed as prediction has been explored by Cunningham (1972), Becker (1973), Drescher (1990), Sutton and Tanner (2005), Rafols (2006), and Sutton (2009, 2012). Other uses of predictions include representing state with predictions (Littman, Sutton &, Singh 2002; Boots et al. 2010) and modeling partially observable domains (Talvitie & Singh 2011). Unfortunately, technical challenges related to numerical instability, divergence under off-policy sampling, and com- putational complexity have limited the applicability and scalability of predictive knowledge acquisition in practice.
This thesis explores a new approach to representing and acquiring predictive knowledge on a robot. The key idea is that value functions, from reinforcement learning, can be used to represent policy-contingent declarative and goal-oriented predictive knowledge. We use recently developed gradient-TD methods that are compatible with off-policy learning and function approximation to explore the practicality of making and updating many predictions in parallel, while the agent interacts with the world from continuous inputs on a robot.
The work described here includes both empirical demonstrations of the effectiveness of our new approach and new algorithmic contributions useful for scaling prediction learning. We demonstrate that our value functions are practically learnable and can encode a variety of knowledge with several experiments—including a demonstration of the psychological
phenomenon of nexting, learning predictions with refined termination conditions, learn- ing policy-contingent predictions from off-policy samples, and learning procedural goal- directed knowledge—all on two different robot platforms. Our results demonstrate the po- tential scalability of our approach; making and updating thousands of predictions from hun- dreds of thousands of multi-dimensional data samples, in realtime and on a robot—beyond the scalability of related predictive approaches. We also introduce a new online estimate of off-policy learning progress, and demonstrate its usefulness in tracking the performance of thousands of predictions about hundreds of distinct policies. Finally, we conduct a novel empirical investigation of one of our main learning algorithms, GTD(λ), revealing several new insights of particular relevance to predictive knowledge acquisition. All told, the work described here significantly develops the predictive approach to knowledge.
- Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
- Citation for previous publication
Multi-timescale Nexting in a Reinforcement Learning Robot, Jospeh Modayil, Adam White , Richard Sutton, Adaptive Behavior, published online February 7, 2014.Scaling life-long off-policy learning, Adam White, Joseph Modayil, Richard S. Sutton. IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), 2013.Multi-timescale nexting in a reinforcement learning robot, Modayil, Joseph and White, Adam and Sutton, Richard. From Animals to Animats 12 , 2012.Acquiring Diverse Predictive Knowledge in Real Time by Temporal-difference Learning, Modayil, Joseph and White, Adam and Pilarski, Patrick M and Sutton, Richard S. Systems, Man, and Cybernetics (SMC), 2012.Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. Sutton, Richard S and Modayil, Joseph and Delp, Michael and Degris, Thomas and Pilarski, Patrick M and White, Adam and Precup, Doina. Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, 2011.Surprise and curiosity for big data robotics, Adam White , Jospeh Modayil, Richard Sutton, Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence,2014.
- Date Uploaded
- Date Modified
- Audit Status
- Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 13678112
Last modified: 2016:06:24 17:55:58-06:00
Original checksum: e5f782f862f4bfbb5f293df86890d82c
Well formed: false
Status message: Invalid page tree node offset=9763254
Status message: Unexpected error in findFonts java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary offset=4974
Page count: 36