- 93 views
- 67 downloads
The Contrastive Gap: A New Perspective on the ‘Modality Gap’ in Multimodal Contrastive Learning
-
- Author / Creator
- Fahim, Abrar
-
Learning jointly from images and texts using contrastive pre-training has emerged as an effective method to train large-scale models with a strong grasp of semantic image concepts. For instance, CLIP, pre-trained on a large corpus of web data, excels in tasks like zero-shot image classification, object detection, geolocalization, and more. These contrastive models embed input images and texts into a shared representational space.
Recently, it was discovered that models like CLIP show a "modality gap", where image and text embeddings occupy disjoint areas in the representational space. Previous research attributes this gap to factors like data artifacts (mismatched pairs), model architecture artifacts (the cone effect), and the nature of the loss landscape (getting stuck in local minima). In this thesis, we demonstrate that even after accounting for these factors, the contrastive loss itself creates this gap during training. We propose renaming this phenomenon as the "contrastive gap" and show that it stems from low uniformity in the CLIP space, where embeddings only occupy a small portion of the latent space. We show that optimizing for uniformity and alignment in the CLIP space reduces the contrastive gap. Our experiments show that this modified representational space achieves better performance on downstream tasks like zero-shot image classification and multi-modal arithmetic, suggesting the effectiveness of closing the contrastive gap to boost CLIP performance.
-
- Graduation date
- Fall 2024
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.