Improving Semantic Image Segmentation by Object Localization

  • Author / Creator
    Zhang, Zichen
  • Semantic segmentation is about classifying every pixel in an image. In recent years, methods based on Fully Convolutional Networks (FCN) have dominated this field in terms of segmentation accuracy. We are interested in tackling the challenges that these methods are faced with. First, it is expensive to acquire pixel level labels to train the network. Second, FCN often has trouble with data that present imbalanced positive and negative samples. This issue often comes up in domains such as medical imaging and satellite imagery analysis, where the object of interest can be very small. The large number of negative samples can overwhelm the positive samples during training, leading to a biased representation learned by the network. In this thesis, we investigate how an object localization mechanism can address these two challenges. We propose an end-to-end neural network that improves the segmentation accuracy of FCN by incorporating an object localization unit. This network performs object localization first, which is then used as a cue to guide the training of the segmentation network. The two steps share convolutional features. This allows us to leverage object detection labels to help with the training of the segmentation network, alleviating the need for large-scale pixel level labels. To avoid applying max pooling on object proposals that limits the spatial accuracy, we introduce a new type of convolutional layer named ROI convolution. It applies convolution directly on the object proposals in one shot, without the need of passing them individually through the downstream network. We show that this layer is differentiable therefore allowing the network to be trained end-to-end. To demonstrate the efficacy of our method, we apply it to the problem of medical image segmentation. With the object localization unit, our method performs well despite the high class imbalance and it outperforms existing methods on small object segmentation. To understand further about the proposed method and the impact of ROI convolution, we also conducted ablation studies and experimented on an endoscopic image dataset with balanced data.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Jagersand, Martin (Computing Science)
    • Cobzas, Dana (Computing Science)
  • Examining committee members and their departments
    • Cobzas, Dana (Computing Science)
    • Ray, Nilanjan (Computing Science)
    • Jagersand, Martin (Computing Science)