Usage
  • 71 views
  • 76 downloads

Unpaired Document Image Denoising for OCR using BiLSTM enhanced CycleGAN

  • Author / Creator
    Singh, Katyani
  • The recognition performance of Optical Character Recognition (OCR) models can be sub-optimal when document images suffer from various degradations. Supervised learning-based methods for image enhancement can generate high-quality enhanced images. However, these methods require the availability of corresponding clean images or ground truth text for training. Moreover, the paired training data used for training these models is usually generated by adding different types of synthetic noise to clean images. Real-world noise is more challenging and complex in nature compared to synthetic noise. To effectively enhance real-world noisy images, the models must be trained using real noisy images. However, it is infeasible to have corresponding clean images for real-world noisy images, and creating ground truth text requires manual effort. Unsupervised methods have been explored in recent years, focusing on enhancing natural scene images. In the case of document images, preserving the readability of text in the enhanced images is of utmost importance for improved OCR performance. In this thesis, we explore the possibility of enhancing documents in an unsupervised setting using unpaired training samples. To this end, we propose a modified architecture for the standard CycleGAN model to improve its performance in enhancing document images with better text preservation. The results indicate that the proposed model leads to better preservation of text and improved OCR performance compared to the CycleGAN model and classical unsupervised image preprocessing techniques like Sauvola and Otsu.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-7hd4-0761
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.