Download the full-sized PDF of Dynamic Tuning of PI-Controllers based on Model-free Reinforcement Learning MethodsDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Dynamic Tuning of PI-Controllers based on Model-free Reinforcement Learning Methods Open Access


Other title
Dynamic tuning
Process Control
Reinforcement Learning
Type of item
Degree grantor
University of Alberta
Author or creator
Abbasi Brujeni, Lena
Supervisor and department
Dr. Jong Min Lee (Chemical and Materials Engineering)
Dr. Sirish L. Shah (Chemical and Materials Engineering)
Examining committee member and department
Dr. Richard Sutton (Computing Science)
Dr. Vinay Prasad (Chemical and Materials Engineering)
Department of Chemical and Materials Engineering

Date accepted
Graduation date
Master of Science
Degree level
In this thesis, a Reinforcement Learning (RL) method called Sarsa is used to dynamically tune a PI-controller for a Continuous Stirred Tank Heater (CSTH) experimental setup. The proposed approach uses an approximate model to train the RL agent in the simulation environment before implementation on the real plant. This is done in order to help the RL agent initially start from a reasonably stable policy. Learning without any information about the dynamics of the process is not practically feasible due to the great amount of data (time) that the RL algorithm requires and safety issues. The process in this thesis is modeled with a First Order Plus Time Delay (FOPTD) transfer function, because almost all of the chemical processes can be sufficiently represented by this class of transfer functions. The presence of a delay term in this type of transfer functions makes them inherently more complicated models for RL methods. RL methods should be combined with generalization techniques to handle the continuous state space. Here, parameterized quadratic function approximation compounded with k-nearest neighborhood function approximation is used for the regions close and far from the origin, respectively. Applying each of these generalization methods separately has some disadvantages, hence their combination is used to overcome these flaws. The proposed RL-based PI-controller is initially trained in the simulation environment. Thereafter, the policy of the simulation-based RL agent is used as the starting policy of the RL agent during implementation on the experimental setup. As a result of the existing plant-model mismatch, the performance of the RL-based PI-controller using this primary policy is not as good as the simulationresults; however, training on the real plant results in a significant improvement in this performance. On the other hand, the IMC-tuned PI-controllers, which are the most commonly used feedback controllers are also compared and they also degrade because of the inevitable plant-model mismatch. To improve the performance of these IMC-tuned PI-controllers, re-tuning of these controllers based on a more precise model of the process is necessary. The experimental tests are carried out for the cases of set-point tracking and disturbance rejection. In both cases, the successful adaptability of the RL-based PI-controller is clearly evident. Finally, in the case of a disturbance entering the process, the performance of the proposed model-free self-tuning PI-controller degrades more, when compared to the existing IMC controllers. However, the adaptability of the RL-based PI- controller provides a good solution to this problem. After being trained to handle disturbances in the process, an improved control policy is obtained, which is able to successfully return the output to the set-point.
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 597017
Last modified: 2015:10:22 02:54:21-06:00
Filename: Abbasi Brujeni_Lena_Spring 2010.pdf
Original checksum: 41af90e28cdcab1518042a14500550df
Well formed: false
Valid: false
Status message: Unexpected error in findFonts java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary offset=3080
Page count: 95
Activity of users you follow
User Activity Date