Robust Probabilistic Principal Component Analysis Based Modeling with Gaussian Mixture Noises

  • Author / Creator
    Sadeghian, Anahita
  • Most of the industrial plants are heavily instrumented with a large number of sensors and analyzers to provide the data needed for process control and monitoring purposes. However, online and fast-rate measurements are not always available due to restricted availability and/or reliability of measurement techniques and devices. Even in cases where appropriate measuring devices are available, some key process variables are still determined offline by laboratory sample analysis or by means of often unreliable online analyzers. Such methods of process data acquisition are time consuming, often expensive in the long run, and introduce delays and discontinuities into their application. On top of that, the development of advanced process monitoring and control techniques is key to achieving profitability, meeting safety requirements and operating environmental friendly processes. This development stage requires the operational data to be recorded for the analysis of the problem.

    A popular approach to make reliable data available fast and at lower cost is using predictive models. Predictive models are basically mathematical models of the process which can be developed based on the history of the plant and using available data. In some cases, if possible, adding first principles equations could better the accuracy of the model. It is important to have relevant data that are clean to an acceptable extent, and cover a meaningful time span of the process under study. These circumstances might not be available perfectly. Data quality, namely availability, accuracy, relevance, density, and frequency, is a pivotal determinant for the outcome of a model. Some common disputable phenomena are uncertainty, high-dimensionality in terms of the count of recorded features compared to that of sample points, outlying observation, missing records, nonlinearity, and non-Gaussianity. In this thesis we have targeted a combination of the most relevant phenomenon in a chemical process record such as \emph{uncertainty}, \emph{high-dimensionality}, \emph{outliers} and their\emph{non-Gaussianity}.

    Probabilistic models are potent in terms of dealing with uncertainties, so are principal component analysis (PCA) methods in handling high-dimensionality. As a result, probabilistic principal component analysis (PPCA) based models are considered as the motif for this research. Conventionally, for probabilistic principal component analysis based models, noise with a Gaussian distribution is assumed for both input and output observations. This assumption makes the model to be vulnerable to large random errors, earlier referred to as outliers. In this thesis, unlike the conventional noise assumption, a mixture noise model with a contaminated Gaussian distribution piece is adopted for probabilistic modeling to diminish the adverse effect of outliers, which usually occur due to irregular process disturbances, instrumentation failures or transmission problems. This is done by downweighing the effect of the noise component which accounts for contamination, on the model output prediction. This adoption is implemented in different settings: a scaled mixture noise model, a location mixture noise model and a switching noise model to account for the dynamic behaviour of noise, for either process noise or the measurement noise. More details will be cracked further in the main chapters.

    Finally, in comparison with the conventional PPCA based model and specific robust PPCA based models, the prediction performance of the developed robust model is evaluated in presence of data contamination. To further appraise the model validity and practicality, two case studies were carried out for each development. A simulated set of data with predefined characteristics to highlight the presence of outliers was used to demonstrate the robustness of the model. The advantages of this robust model are further illustrated via a set of real process data set from our industrial partners.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.