Usage
  • 113 views
  • 233 downloads

Reinforcement Learning based Controller Design for Nonlinear Process Control

  • Author / Creator
    Shafi, Hareem
  • Reinforcement learning (RL) has received wide attention in various fields lately. Model-free RL brings data-driven solutions that learn the control strategy directly from interaction with process data without the need for a process model. This is especially beneficial in the case of nonlinear processes where the process model might not be readily available or accurate. It circumvents the need for a model identification step. It is also able to re-train in the case of process shifts or process noise to improve performance. In contrast, traditional model-based control methods require an explicit process model, and the performance of parametric models deteriorates over time in case of process shifts or unmeasured disturbances. However, despite learning schemes like deep deterministic policy gradient (DDPG), deep Q-networks (DQN) or actor-critic, convergence to an optimal policy in process control remains a persistent challenge.

    This thesis focuses on the integration of RL based methods into the process control domain. The first part of the thesis addresses the multivariate control of continuous state and action spaces of chemical processes. A parallel learning architecture is utilized to improve the control quality and convergence to an optimal policy through a better exploration of the state and action space. A centralized RL agent is able to successfully learn an effective policy for servotracking control of a quadruple tank system as an example. It can learn directly from interactions with the process while ensuring that the process remains operational.

    The second part of the thesis deals with developing a hierarchical RL based constrained controller for a higher-level optimization of the Primary Separation Vessel (PSV). A supervisory RL agent is concerned with improving the bitumen recovery rate through interface level setpoint manipulation. A lower-level RL ensures control of the froth-middlings interface level. It does so despite the nonlinear nature of the process and the unpredictability of the ore composition of the slurry fed into the PSV. The unpredictability also necessitates regulation of the tailings density below a sanding threshold. It is carried out through manipulations of the tailings flowrate by a non-interacting sanding prevention RL agent. In the interface level control loop, behavioral cloning based two-phase learning scheme to promote stable state space exploration is also proposed. Based on simulation results, the behavioral cloning scheme ensured improved convergence to the near-optimal policy. The proposed hierarchical structure successfully demonstrates improved bitumen recovery rate by manipulating the interface level while preventing sanding, demonstrating the feasibility of such approaches to chemical processes.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-banv-sv83
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.