Reinforcement Learning-based Process Control Under Sensory Uncertainty

  • Author / Creator
    Dogru, Oguzhan
  • Process industries involve processes that have complex, interdependent, and sometimes uncontrollable/unobservable features that are subject to a variety of uncertainties such as operational fluctuations, sensory noises, process anomalies, human involvement, market volatility, and so forth. In the face of unpredictability, industrial applications strive to exhibit consistent operational excellence in terms of product quality, economic benefits, process safety, and environmental sustainability. These operational criteria necessitate intelligent solutions to a wide range of operations that can be enhanced without requiring substantial modelling effort. Reinforcement learning (RL), as a data-driven method, can provide a practical answer to such issues by employing various types of sensory information.
    A robust interface tracking algorithm is the first contribution of this thesis. In contrast to existing methods, which rely on hand-crafted features, the proposed algorithm provides a tracking methodology comprised of convolutional neural and long short-term memory networks that are jointly optimized for interface tracking. Without any explicit models or restrictive assumptions, this structure integrates neighbouring spatial and spatiotemporal elements. Unlike supervised/unsupervised learning methods, the proposed RL-based tracking algorithm requires only a few images that can be labelled quickly and accurately by a user or a sensor. This agent outperforms some of the existing methods in terms of robustness, which is one of the most important requirements in state estimation and control. Finally, by employing a dimensionality reduction technique, our work contributes to deep learning-based RL solutions.
    The second contribution aims to develop an RL-based safe controller while taking safety requirements into account. The suggested approach combines a deep actor-critic agent with random setpoint initialization and a Lagrangian-based soft-constrained learning scheme to achieve this goal. The example demonstrated that the soft-constrained approach could provide smooth state transitions while accelerating the offline training phase with several workers. In addition, an exploration metric inspired by the set theory was developed.
    The third contribution takes into account the constrained uncertain reward/cost function, which is often employed in RL and process control. A reduced signal-to-noise ratio in a process can permanently deteriorate the control policy and result in poor tracking/control performance. Taking sensory noise into account, the proposed method models the reward/cost function as a dynamic process, along with transition and observation models. Using a constrained particle filter, the proposed method estimates the first and second moments of the constrained reward.
    The fourth contribution addresses the problem of dimensionality increase during online skew state estimation. Although a closed skew-normal distribution increases the degree of freedom in state estimation, its location and scale parameters increase in size at the end of each filtering stage. This problem slows down the inferential calculation, making closed-form solutions impractical and online inference infeasible in the long term. With the rigorous formulation of dimensionality reduction as an optimization strategy, empirical analyses were carried out to compare various statistical distance functions and optimization techniques. Finally, the proposed skew estimation scheme was applied to problems involving reward estimation and state estimation.
    The fifth contribution proposes an autonomous PID tuning scheme. Since complex industrial plants can utilize thousands of control loops with unknown models, tuning the PID controllers can be time-consuming. This algorithm is based on a constrained contextual bandit that tunes the PID controllers starting with step-response models and gradually learning the plant model mismatch through online interaction.
    The sixth contribution is the development of an autonomous MPC tuner and its integration with an autonomous advanced control infrastructure. Although various traditional approaches may design MPC parameters offline, there can be significant performance deterioration due to model plant mismatch or operational changes. Additionally, establishing specific performance criteria using complex functions can be difficult. However, using smart trial-and-error, the proposed RL agent can produce optimal solutions to such challenges. This modular, model-independent agent, can be pre-trained on step-response models and then integrated into more complicated schemes. By integrating all agents, controllers, and filters, this contribution also includes a proof of concept of an autonomous process control scheme.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.