- 35 views
- 52 downloads
Calibration Models for Real-World Deployment of Reinforcement Learning Agents
-
- Author / Creator
- Coblin, Jordan Frederick
-
The sensitivity of reinforcement learning algorithm performance to hyperparameter choices poses a significant hurdle to the deployment of these algorithms in the real-world, where sampling can be limited by speed, safety, or other system constraints. To mitigate this, one approach is to learn a calibration model from offline data logs, and use this model to simulate trajectories for the purpose of hyperparameter tuning. While there has been preliminary success applying calibration models to simple simulated problems, more work is needed to understand the desirable properties of such models and to test their feasibility in a real-world setting.
In this work, we take the first steps toward characterizing desirable properties of calibration models and provide the first application of a calibration model towards a real-world industrial prediction task. We investigate several measures that can be used to understand model quality and evaluate calibration model implementations according to these measures. The calibration models are then tested on a prediction task for sensors in a water treatment plant (WTP) located in Alberta, Canada. We find that various types of calibration models can be used to simulate simple environments, while generalizing models tend to collapse due to compounding prediction error in the more complex real-world setting. We show how a non-parametric k-nearest neighbors calibration model with a Laplacian distance metric is able to produce realistic rollouts over long-horizons in the WTP setting, and can be used successfully for hyperparameter tuning. Finally, we aim to bridge the gap towards real-world deployment and demonstrate how this model can be scaled to a year's worth of data.
-
- Subjects / Keywords
-
- Graduation date
- Fall 2024
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.