Evaluating Search Spaces for Programmatic Policies in POMDPs

Carvalho, Tales Henrique

doi:doi:10.7939/r3-v1vf-8x64

ERA is in the process of being migrated to Scholaris, a Canadian shared institutional repository service (https://scholaris.ca). Deposits to existing ERA collections are frozen until migration is complete. Please contact erahelp@ualberta.ca for further assistance

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

72 views
105 downloads

Evaluating Search Spaces for Programmatic Policies in POMDPs

Author / Creator

Carvalho, Tales Henrique
Searching for programmatic policies to solve a reinforcement learning problem can be challenging, particularly when dealing with domain-specific languages (DSLs) that define policies with internal states for partially observable Markov decision processes (POMDPs). This is because they lead to complex and discontinuous search spaces, often requiring combinatorial search processes. To avoid searching in the programmatic space, the recent work LEAPS and HPRL learn latent spaces of DSLs, which are used to define policies for POMDPs. Aside from reconstructing programs from their embedding representations, these spaces are trained to achieve locality in program behavior, expecting that vectors close in the latent space decode to programs that behave similarly. In this work, we show that searching using a hill-climbing process in the original programmatic space, induced by the DSL itself and requiring no learning, achieves a similar locality measure in program behavior and significantly outperforms LEAPS and HPRL in finding high-reward policies. We further analyze the optimization topology induced by the neighborhood function of each search space in conjunction with the reward function of the POMDP. We show that a local search algorithm is more likely to stop in local maxima regions when searching for high-reward policies in the latent space than when searching in the original programmatic space. This result implies that the programmatic space is more conducive to local search and explains its superior performance.
Subjects / Keywords
- programmatic policy
- reinforcement learning
Graduation date

Spring 2024
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-v1vf-8x64
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Lelis, Levi (Computing Science)