Improving Semantic Image Segmentation by Object Localization

Zhang, Zichen

doi:doi:10.7939/R3TX35N29

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

366 views
522 downloads

Improving Semantic Image Segmentation by Object Localization

Author / Creator

Zhang, Zichen
Semantic segmentation is about classifying every pixel in an image. In recent years, methods based on Fully Convolutional Networks (FCN) have dominated this field in terms of segmentation accuracy. We are interested in tackling the challenges that these methods are faced with. First, it is expensive to acquire pixel level labels to train the network. Second, FCN often has trouble with data that present imbalanced positive and negative samples. This issue often comes up in domains such as medical imaging and satellite imagery analysis, where the object of interest can be very small. The large number of negative samples can overwhelm the positive samples during training, leading to a biased representation learned by the network. In this thesis, we investigate how an object localization mechanism can address these two challenges. We propose an end-to-end neural network that improves the segmentation accuracy of FCN by incorporating an object localization unit. This network performs object localization first, which is then used as a cue to guide the training of the segmentation network. The two steps share convolutional features. This allows us to leverage object detection labels to help with the training of the segmentation network, alleviating the need for large-scale pixel level labels. To avoid applying max pooling on object proposals that limits the spatial accuracy, we introduce a new type of convolutional layer named ROI convolution. It applies convolution directly on the object proposals in one shot, without the need of passing them individually through the downstream network. We show that this layer is differentiable therefore allowing the network to be trained end-to-end. To demonstrate the efficacy of our method, we apply it to the problem of medical image segmentation. With the object localization unit, our method performs well despite the high class imbalance and it outperforms existing methods on small object segmentation. To understand further about the proposed method and the impact of ROI convolution, we also conducted ablation studies and experimented on an endoscopic image dataset with balanced data.
Subjects / Keywords
Graduation date

Spring 2018
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3TX35N29
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Jagersand, Martin (Computing Science)
- Cobzas, Dana (Computing Science)
Examining committee members and their departments
- Cobzas, Dana (Computing Science)
- Ray, Nilanjan (Computing Science)
- Jagersand, Martin (Computing Science)