Usage
  • 73 views
  • 97 downloads

Grounding Concepts to Vision via Descriptions

  • Author / Creator
    Ogezi, Michael
  • This thesis introduces a new approach for grounding concepts to vision using visual descriptions, which are text-based descriptions of visual attributes. We hypothesize that these descriptions can enhance the grounding of concepts to vision, thereby improving performance in vision-language tasks. We also suggest that these descriptions can be effectively produced using pre-trained language models. Toward validating our hypotheses, we conduct two studies.

    In the first study, we address the task of visual word sense disambiguation. This task aims to select the image that best represents the meaning of a word in context. Here, we demonstrate that augmenting the original context with rich visual descriptions produced by a language model significantly improves performance.

    In the second study, we attempt to produce visual descriptions for arbitrary, concrete concepts, focusing on two downstream tasks: zero-shot image classification and zero-shot class-conditional image generation. Primarily, we demonstrate that conditioning a large language model with lexico-semantic knowledge from a semantic knowledge base produces richer, and better grounded visual descriptions than previous methods. Furthermore, these visual descriptions result in substantial empirical improvements in the aforementioned downstream tasks.

    Overall, this thesis confirms our initial hypothesis and demonstrates that visual descriptions offer a robust mechanism for grounding concepts to the visual domain.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-yxnj-cw36
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.