Alleviate the Domain-shift Problem of Vision Tasks

Huo, Dong

doi:doi:10.7939/r3-cjwr-r307

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

71 views
39 downloads

Alleviate the Domain-shift Problem of Vision Tasks

Author / Creator

Huo, Dong
Computer vision tasks have seen breakthroughs in recent years thanks to the emergence of deep learning (DL). However, there exists different types of domain-shift problems that may impact the performance of DL-based methods. In low-level vision tasks, \eg, image restoration, the degradation of the training data can be different from that of the testing data. In high-level vision tasks such as image segmentation, models trained on the common object datasets cannot be applied to some specific objects. In this dissertation, I present novel learning strategies to alleviate the domain-shift problems of both low-level and high-level vision tasks.

I start with low-level vision tasks, specifically, dynamic scene deblurring/deconvolution (2D spatial degradation). The degradation matrices of the training data are rarely seen during testing. Thus, I propose two approaches to solve this problem from different angles. In the first approach, the model trained on paired datasets can adaptively adjust to different magnitudes and directions of the motion blur, which is capable of solving unseen degradations. Although the generalization of the model is improved, it still depends on synthetic data for training because the number of real-world paired images available for training are limited. Hence, the performance is lowered on real blurred data. In the second approach, I adopt the deep image prior (DIP) to bypass supervised training and utilize only a single degraded image to update the neural network parameters, which is more flexible to variant blurs without the impact of the training data.

In 3D, the degradation is related to spatial resolution degradation in the context of novel view synthesis. Along this direction, I propose a novel view synthesis task, which can reconstruct novel views of 3D objects. In this dissertation, I simplify the problem into texture generation for untextured 3D meshes. Novel views are synthesized using a depth conditioned image generation model with the source view used as the guidance, and the depth rendered from a given mesh. Similar to the 2D spatial degradation problem, most of the texture generation models are trained on synthetic data due to the limitation of real training data, and cannot generate photo-realistic textures. Recently, Stable Diffusion (SD) trained on large real-world image datasets for image generation has been applied to many down-stream tasks, \eg, geometry generation, novel-view synthesis. I adopt the pretrained SD without fine-tuning to generate photo-realistic textures for 3D objects conditioned on an extra textual prompt.

Going from the spatial to the spectral domain, I address the problem of spectral degradation with the objective of recovering the spectral reflectance from a single RGB image. Due to the difficulty of obtaining paired training data, most of the methods are trained and tested on synthetic data. Similar to other degradation problems, the synthetic data are not the same as the real data so the trained models using these datasets fail when they are applied to real data. To address this issue, I propose to adopt meta-auxiliary learning to solve this problem by training the model on synthetic data but adapting it to the real data at test time with only several steps of gradient updates.

For high-level vision tasks, \eg, semantic segmentation, I solve the glass surface segmentation problem where the semantic segmentation methods trained on common objects fail to detect transparent glass surfaces. Considering the different transmission of the glass with regard to the visible light and infrared (thermal) light, an extra thermal camera is exploited for better detection. In particular, I collected an extensive paired RGB-thermal image dataset with manually labeled masks for model training, and aggregated the trained model with existing semantic segmentation methods to generalize semantic segmentation to glass scenes.
Subjects / Keywords
- computer vision
- domain shift
Graduation date

Fall 2024
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-cjwr-r307
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Yang, Herb (Computing Science)