Usage
  • 14 views
  • 9 downloads

Commonsense Knowledge Generation and Analysis using Deep Learning Models

  • Author / Creator
    Rezaei Sarchoghaei, Navid
  • There has been a renewed interest in commonsense as a stepping stone toward achieving human-level intelligence. By digesting enormous amounts of data in different forms, such as visual, lingual, and sensory, humans are able to create a world model for themselves. It is hypothesized that this knowledge is the basis for commonsense, which can be defined as a collection of models of the world that know the plausibility of entities or interactions. This commonsensical world model helps humans learn new skills with very few trials, as they can predict the consequences of actions and plan and reason for the next steps.

    In our work, we explore and experiment with how we can generate commonsense knowledge and how it can ultimately benefit deep learning models to gain commonsense. We also analyze the weaknesses of large language models (LLMs) in a commonsense context and provide solutions to improve LLMs in commonsensical tasks.

    Inspired by how toddlers learn about their environment, we first introduce a methodology to generate commonsense knowledge using only visual input. We use knowledge graphs as the preferred method of data storage, as they are easy to access and require low time complexity to expand. We further expand the knowledge stored with plausibility weights of triples and contextual information.

    As linguistics is an important next step in a toddler’s mental model of the world, we experiment with transformer-based language models to expand the vision-based commonsense even further. Through experiments with language models, we observe that larger language models, trained in an unsupervised fashion, have more embedded commonsense than their smaller counterparts. Symbolic and vetted storage of commonsense knowledge from different sources can improve commonsense capabilities in smaller language models. Resource-restricted use cases, such as smartphones or self-driving cars, benefit from offline smaller language models.

    During our research, we noticed the high cost of human annotations to gather human commonsense datasets used to train language models. As a by-product of our research, we proposed a model-agnostic prompt technique to reduce costly human textual annotations for fine-tuning language models.

    Lastly, we demonstrate that out-of-ordinary questions can throw the LLMs off guard. We illustrate how \textit{negated complementary} questions adversely affect the model responses. We propose a model-agnostic methodology to improve the performance in \textit{negated complementary} scenarios. Our method outperforms few-shot generation from GPT-3 (by more than 11 points) and, more importantly, highlights the significance of studying the response of large language models in different commonsensical scenarios.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-81vd-sc51
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.