- 27 views
- 45 downloads
Evaluating Search Spaces for Programmatic Policies in POMDPs
-
- Author / Creator
- Carvalho, Tales Henrique
-
Searching for programmatic policies to solve a reinforcement learning problem can be challenging, particularly when dealing with domain-specific languages (DSLs) that define policies with internal states for partially observable Markov decision processes (POMDPs). This is because they lead to complex and discontinuous search spaces, often requiring combinatorial search processes. To avoid searching in the programmatic space, the recent work LEAPS and HPRL learn latent spaces of DSLs, which are used to define policies for POMDPs. Aside from reconstructing programs from their embedding representations, these spaces are trained to achieve locality in program behavior, expecting that vectors close in the latent space decode to programs that behave similarly. In this work, we show that searching using a hill-climbing process in the original programmatic space, induced by the DSL itself and requiring no learning, achieves a similar locality measure in program behavior and significantly outperforms LEAPS and HPRL in finding high-reward policies. We further analyze the optimization topology induced by the neighborhood function of each search space in conjunction with the reward function of the POMDP. We show that a local search algorithm is more likely to stop in local maxima regions when searching for high-reward policies in the latent space than when searching in the original programmatic space. This result implies that the programmatic space is more conducive to local search and explains its superior performance.
-
- Subjects / Keywords
-
- Graduation date
- Spring 2024
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.