Unknown-box Approximation to Improve Optical Character Recognition Performance

  • Author / Creator
    Ponnamperuma Arachchige, Ayantha Randika
  • Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. Several feature-rich commercial OCR solutions and opensource OCR solutions are available for consumers, which can provide moderate to excellent accuracy levels. These solutions are general-purpose by design to serve a wider community. However, accuracy can diminish with difficult and uncommon document domains. Preprocessing of document images can be used to minimize the effect of domain shift. In this thesis, we investigate the possibility and the effect of using OCR engine feedback to train a preprocessor. The main obstacle in this approach is propagating the error signal through an opaque OCR engine. Circumventing this obstacle, we propose a novel preprocessor trained using gradient approximation. Unlike the previous OCR agnostic preprocessing techniques, the proposed training approach approximates a particular OCR engine's gradient and trains the preprocessor module eliminating the need for intermediate labels. We compare two different methods to our proposed approach to establish a better training pipeline. Experiments with two different datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR engine from the baseline accuracy by applying pixel-level manipulations to the document image.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.