Calibration Models for Real-World Deployment of Reinforcement Learning Agents

Coblin, Jordan Frederick

doi:doi:10.7939/r3-rvx0-3743

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

64 views
107 downloads

Calibration Models for Real-World Deployment of Reinforcement Learning Agents

Author / Creator

Coblin, Jordan Frederick
The sensitivity of reinforcement learning algorithm performance to hyperparameter choices poses a significant hurdle to the deployment of these algorithms in the real-world, where sampling can be limited by speed, safety, or other system constraints. To mitigate this, one approach is to learn a calibration model from offline data logs, and use this model to simulate trajectories for the purpose of hyperparameter tuning. While there has been preliminary success applying calibration models to simple simulated problems, more work is needed to understand the desirable properties of such models and to test their feasibility in a real-world setting.

In this work, we take the first steps toward characterizing desirable properties of calibration models and provide the first application of a calibration model towards a real-world industrial prediction task. We investigate several measures that can be used to understand model quality and evaluate calibration model implementations according to these measures. The calibration models are then tested on a prediction task for sensors in a water treatment plant (WTP) located in Alberta, Canada. We find that various types of calibration models can be used to simulate simple environments, while generalizing models tend to collapse due to compounding prediction error in the more complex real-world setting. We show how a non-parametric k-nearest neighbors calibration model with a Laplacian distance metric is able to produce realistic rollouts over long-horizons in the WTP setting, and can be used successfully for hyperparameter tuning. Finally, we aim to bridge the gap towards real-world deployment and demonstrate how this model can be scaled to a year's worth of data.
Subjects / Keywords
- Reinforcement Learning
- Water Treatment
Graduation date

Fall 2024
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-rvx0-3743
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- White, Adam (Computing Science)