Multivariate Exploratory Data Analysis of Spatial Data to Support Geostatistical Modeling

Zhang, Haoze

doi:doi:10.7939/r3-a59t-6s56

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

213 views
273 downloads

Multivariate Exploratory Data Analysis of Spatial Data to Support Geostatistical Modeling

Author / Creator

Zhang, Haoze
Geostatistical modeling takes geological data as inputs and builds statistical models for resource
prediction. Geostatistics consists of several components, including preprocessing, modeling, and
postprocessing. Exploratory data analysis (EDA) is an early step in preprocessing. It provides the
characteristics of data and helps identify erroneous or inconsistent data. In the context of geostatistics, missing data and below detection limit (BDL) data are an important anomaly to be understood
in EDA. Missing data are problematic in EDA techniques such as principal component analysis
(PCA). BDL data also cause problems when conducting cluster analysis and other analysis. Geostatistical models need to be conducted in stationary domains, so multivariate and spatial cluster
analysis is another important aspect in EDA. It separates data into smaller groups in which data
share similar features.
This thesis covers multiple aspects of geostatistical EDA. A data map examines missing data, and
it shows the number of missing data in each variable and location. A combined permutation and
Kolmogorov–Smirnov (KS) test identify if the missingness in variables is systematic. BDL data are
investigated in univariate and bivariate methods. A BDL statistics table complements histograms.
Three methods measure the spikiness of data. Bivariate analysis compares observed distributions
with expected distributions which indicate full independence of BDL occurrence. Kullback–Leibler
(KL) test quantifies the difference between the distributions, obtaining combinations of variables in
which the BDL occurrence can be dependent. This helps the understanding of the reasons for BDL
data.
The handling of BDL data in cluster analysis is addressed, including a workflow that finds the
optimal number of clusters. Tests on synthetic data examine the compatibility of the workflow with
different data transformations and clustering methods. K‑means is a suitable clustering method for
dealing with BDL spikes. Four transformations compatible with the workflow are combined with
k‑means to examine clusters in real data. The trade‑off between spatial continuity and multivariate
continuity in cluster analysis is addressed. A novel classification method is proposed to find the
optimal clustering and domain labels. Ensemble clustering labels are used as inputs for the classification. The classification algorithm takes multiple sets of clustering labels as inputs. The domains
are assigned based on clustering labels and two hyperparamters ‑ spatial weight and number of
domains. The matrix of classification results shows higher spatial weight results in more continuous domains. Flow simulation results show that the domain label assignment has an impact on
the performance of the final geostatistical models, because flow responses are highly sensitive to
spatial and multivariate continuity.
Subjects / Keywords
- Geostatistics
Graduation date

Spring 2022
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-a59t-6s56
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Civil and Environmental Engineering
Specialization
- Mining Engineering
Supervisor / co-supervisor and their department(s)
- Deutsch, Clayton (Civil and Environmental Engineering)