Usage
  • 232 views
  • 400 downloads

Robust Latent Variable Modeling Using Probabilistic Slow Feature Analysis

  • Author / Creator
    Fan, Lei
  • Data-driven modeling approaches have been widely studied and applied to the process industries for inferential sensor development, process monitoring and fault detection and early warnings, etc. Essential information of process, like dynamic and relationships between process variables are buried in the massive archived historical data. They are often with high dimensionality and corrupted by diffident kinds of data irregularities, e.g. outliers, missing and multi-rate samples, uncertain time delays, etc. To address all these data irregularities and build a computational efficient modeling approach, the latent variable modeling has become a preferred and successful method. In most chemical processes, the process condition does not vary too fast and often contains large inertia. It is naturally considered that the features with small varying velocity are informative and carry most of the information of the process. With a probabilistic formulation, dynamic latent variable models, based on extracting slowly varying features, are developed in this thesis to address the aforementioned data irregularities, thus give reliable prediction results of quality variables that are otherwise difficult to measure.

    Outliers are observations that are distant from other observations and they are common in process variable measurements. A robust dynamic latent feature extraction model is first proposed in this thesis to handle the outlier issue. By assuming the observations following the Student's t-distribution that has heavier tails, more weights can be assigned to the outliers thus they can be properly accounted for during modeling process. In feature extraction phase, a weighted Kalman gain is proposed since it violates the Gaussian assumption of the traditional Kalman filter. Smoother and slower features can be extracted and the impact of outliers is alleviated by the latent variance scale.

    The next contribution of this thesis is to develop a semi-supervised model based on probability slow feature analysis to include the information from quality variables in the extracted latent features while accounting for the missing data issues in quality variables. An approach by augmenting both input and output variables is proposed. It can deal with the different missing data issues, i.e. either missing at random or multi-rate sampling. In latent feature extracting process, the quality variable samples can be utilized whenever they are available. The compensation by the past quality variable samples leads to better predictability of its future samples.

    Another irregular property of the lab samples of quality variable is its uncertain time delays. In many cases, the quality variables are sampled and analyzed manually by operators if the real-time on-line analysis is not possible. Various factors during manual sampling, i.e. human errors, manual sample, lab analysis and data recording procedures, etc can result in time-varying time delays on the quality variable samples. Another latent variable, delay indicator which evolves following a hidden Markov model, is introduced in the variational Bayesian framework to address this issue. The preference of model parameters is given as their prior distributions. More accurate and meaningful dynamic latent features can be extracted using the shifted samples of quality variables.

    Time-varying time delays not only exist in the quality variables, but also in the fast-sampled process variables since their distributed locations in the plant. The changes of process conditions, varying velocity of flows, changing viscosity of transmission materials, etc., will cause the changes of delay to the target quality variable. The generalization formulation of the earlier work is proposed to address this issue. Multiple Markov chains are introduced to represent the different time-varying time delay sequences for different process variables. Dynamic latent features are extracted using both the shift process variables and scattered quality variable samples. With the consideration of the shifted observations, better prediction results of quality variable are provided.

    The validity and practicality of these proposed probabilistic latent variable modeling approaches are verified through numerical examples, benchmark simulations, experimental studies and industrial applications. Specifically, the application to the SAGD well pair water content prediction performance is improved by applying proposed methods when data irregularities are considered.

  • Subjects / Keywords
  • Graduation date
    Fall 2020
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-h7a2-z709
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.