Usage
  • 175 views
  • 253 downloads

Unsupervised domain adaptation for object detection and whole slide image classification

  • Author / Creator
    Yang, Yuchen
  • Deep neural network (DNN) has been developed rapidly in years. While it shows promising results in various tasks of computer vision, DNN typically suffers from accuracy loss due to the domain shift from a source domain to a target domain. To mitigate the accuracy loss without the label from target domain, unsupervised domain adaptation (UDA) approaches are proposed.

    Compare to most UDA studies that target image classification and pixel-level classification (image segmentation), UDA for object detection is a relatively new area. A popular processing pipeline is to apply adversarial training with domain discriminator. The domain discriminator aligns the feature distributions of the source and target domain.

    Existing methods in UDA object detection extract features from image level and directly adapt the full features as in UDA for classification tasks. However, alignment on full image level features as a whole is not ideal for object detection task. The presence of varied backgrounds could interfere with the result of adaptation. To avoid alignment on a full feature, this thesis proposes a novel foreground-focused domain adaptation (FFDA) framework. This FFDA framework mines the loss of the domain discriminators so that the alignment could concentrate on the foreground during backpropagation.

    FFDA collects target predictions and source image labels and uses them to generate mining masks that outline foreground regions. And then it applies the masks to image and instance level domain discriminators to allow backpropagation only on mined regions. In addition, by reinforcing this foreground-focused adaptation throughout multiple layers in the detector model, FFDA pushes the detector to gain a significant accuracy boost on target domain prediction. Compared with previous methods, FFDA method reaches the new state-of-the-art accuracy on adaptation from Cityscape to Foggy Cityscape dataset. The FFDA also demonstrates competitive results on other datasets that include various scenarios for autonomous driving applications.

    In addition to object detection problem, this thesis also discusses the application of UDA for whole slide image (WSI) classification. Image classification for WSI is a challenging task compared to general image classification because of its high resolution and scattered key information. Previous work provided a novel deep Fisher vector coding pipeline for WSI classification. However, this pipeline suffers from the same accuracy drop phenomenon when deployed to another set of WSI from a different institution to perform the same task. This poses a limitation of the practical usage of the pipeline especially when the diagnoses of WSIs are hard to obtain.

    On the other hand, previous works that apply UDA to medical imaging typically focused on adapting on small microscopy image samples or image patches extracted from WSI. UDA for the application of classifying the entire WSI has not yet been discussed due to the limited number of pipelines and datasets that support WSI classification.

    This thesis aims at providing a UDA solution to enhance the robustness of the previous pipeline by mitigating the accuracy drop caused by different WSI datasets. This solution inserts the domain classifiers into the previous pipeline in different stages to align the features during training. The solution is evaluated by calculating confusion matrices before and after the adaptation. The results demonstrate that by placing domain classifiers in different stages the pipeline shows an accuracy boost on target WSI data.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-t9zp-xq05
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.