Usage
  • 171 views
  • 423 downloads

Bridging Gaps in Exploratory Data Analysis using Dimensionality Reduction

  • Author / Creator
    Ghosh, Aindrila
  • Organizational transactions generate immense amounts of data every day. The decisions made using such data are not only important for their financial impacts on the business; they also regulate the relationships with other businesses in their supply chain. There has been much research that focuses on facilitating more efficient data-driven decision making. As a result, in the past years, researchers have explored several directions of research that range from business to technical areas, for this purpose. Such directions include, understanding specific business disciplines in order to identify their challenges and gaps in decision making, creating Exploratory Data Analysis (EDA) tools to help with better visual interpretation of data, and producing algorithms that can assist with compressing and summarizing high-dimensional industrial datasets to analyze them using spatial techniques. However, in each of these explored areas there exist many open challenges. For example, despite of their financial importance, data generating processes from many business units, such as the Sales-and-Subscriptions (S&S) renewal, have received limited attention from researchers. Moreover, with the abundance of EDA tools and data compression algorithms analysts often struggle with the selection of the most appropriate solution for their analytical context. Furthermore, the highly technical nature of data summarization techniques makes their evaluation, interpretation, and usage challenging for both novice and expert data analysts. Following an action research method, this research attempts to bridge several gaps in all the above mentioned areas. Firstly, a longitudinal study across multiple organizations is performed, that identifies the state-of-the-art industrial process of data-driven decision making in the business unit of Sales-and-Subscriptions (S&S). The analysis of the business unit shows that, analyzing customers’ experiences with the seller organization can help mitigate renewal risks. Hence, in the next part of the research, 50 cutting edge visual EDA tools are investigated for their ability to assist with visually exploring large industrial datasets. Then, the focus is shifted to popular data summarization and visual EDA area of Dimensionality Reduction (DR). More specifically, three different challenges associated with the DR process are addressed namely: selection of the most appropriate algorithm, interpretation of its outcome, and evaluation of the quality of the reduced dimensions. In order to achieve the research goals, at first a large-scale experimental study is performed, where 15 of the most popular DR techniques are statistically analyzed and the first ever practitioners’ guideline for selecting DR algorithms in a given analytical context, is created. Next, two novel algorithms namely Local Approximation of Preserved Structure (LAPS) and Global Approximation of Projection Space (GAPS) are presented that help with the interpretation of the structural quality of the outcome of any DR technique. Finally, to enable a user driven evaluation of DR methods, a visual interactive toolkit namely: Visual Explanations of Preserved Structure (VisExPreS) is presented with Proactively Guided LAPS and GAPS. The value and novelty of the presented solutions are demonstrated using extensive evaluations throughout the thesis.

  • Subjects / Keywords
  • Graduation date
    Fall 2020
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-tsdt-2s04
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.