Bridging the Gap: OCR Techniques for Noisy and Distorted Texts

Authors

  • Sanjay Kumar Gorai Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, India Author
  • Shekhar Pradhan Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, India Author

DOI:

https://doi.org/10.32628/CSEIT2511111

Keywords:

Optical Character Recognition (OCR), Deep Learning, Convolutional Neural Networks (CNNs), Transformer Models, Vision Transformers (ViTs), Noisy Text Recognition, Historical Document Digitization, Scene Text Recognition, Multilingual OCR, Preprocessing Techniques, Post-Processing in OCR, Attention Mechanisms, Real-Time OCR Applications, Augmented Reality Integration, Explainable AI

Abstract

Optical Character Recognition (OCR) has evolved significantly over the years, enabling automated text extraction from a variety of sources. However, OCR systems often struggle with noisy and distorted texts, such as those found in low-quality scans, degraded historical documents, or images captured in challenging conditions. This paper explores state-of-the-art techniques and advancements in OCR for handling noisy and distorted texts. We discuss preprocessing methods, robust feature extraction, deep learning models, and post-processing techniques, providing a comprehensive overview of the field. Additionally, we analyse gaps in current research and propose future directions for developing more resilient OCR systems.

Downloads

Download data is not yet available.

References

Memon, J., Sami, M., & Khan, R. A. (2020). Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review. Journal of Imaging Science and Technology, (20–21).

Olejniczak, K., & Šulc, M. (2022). Text Detection Forgot About Document OCR. arXiv preprint arXiv:2210.07903,(14–15).

Mishra, A., Ram, A. S., & Kavyashree, C. (2023). Handwritten Text Recognition Using Convolutional Neural Network. arXiv preprint arXiv:2307.05396, (10-11).

Bamotra, A., & Uppala, P. K. (2023). TransDocs: Optical Character Recognition with Word-to-Word Translation. arXiv preprint arXiv:2304.07637, (18-20).

Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, (8-9).

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press, (12-13).

Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, (7-8).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (30-40).

Vaswani, A.,Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, &Illia Polosukhin. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, (22-30).

Hochreiter, S., &Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, (15-20).

Graves, A.,Marcus Liwicki,SantiagoFernández,RomanBertolami,Horst Bunke, &Jürgen Schmidhuber. (2009). A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (13-22).

Zhang, Z., Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, &Alexander Smola. (2019). ResNeSt: Split-Attention Networks. arXiv preprint arXiv:2004.08955, (26-29).

Dosovitskiy, A., Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, &Neil Houlsby. (2020). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929, (17-35).

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, (5,10).

Shi, B., Bai, X., & Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (10-12).

Baek, J., Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh,&Hwalsuk Lee. (2019). What Is Wrong with Scene Text Recognition Models? Dataset and Model Analysis. Proceedings of the IEEE International Conference on Computer Vision, (17-28).

Karatzas, D., Lluis Gomez-Bigorda,AnguelosNicolaou,SumanGhosh,AndrewBagdanov,&Masakazu Iwamura. (2015). ICDAR 2015 Competition on Robust Reading. Proceedings of the International Conference on Document Analysis and Recognition, (19-25).

Cheng, Z., Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, &Shuigeng Zhou. (2017). Focusing Attention: Towards Accurate Text Recognition in Natural Images. Proceedings of the IEEE International Conference on Computer Vision, (21-25).

Wang, T.,David J. Wu,Adam Coates,&Andrew Y. Ng. (2012). End-to-End Text Recognition with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Pattern Recognition, (18-25).

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Proceedings of the International Conference on Document Analysis and Recognition, (25-30).

Liao, M., Baoguang Shi, &Xiang Bai. (2018). Textboxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, (25-31).

Zisserman, A. (2021). Scene Text Recognition with Transformer. arXiv preprint arXiv:2103.06495, (29-37).

Raghu, M.,Chiyuan Zhang, Jon Kleinberg, &Samy Bengio. (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. Neural Information Processing Systems, (16-20).

Kim, S. (2020). Meta-Learning for Robust Scene Text Recognition. arXiv preprint arXiv:2010.03478, (28-30).

Liu, X. (2021). Towards Explainable Scene Text Recognition. Proceedings of the IEEE International Conference on Computer Vision, (12-25).

Xu, Y. (2022). AR Text Recognition: Augmented Reality Applications for OCR. IEEE Augmented Reality and Human-Computer Interaction Conference, (32-40).

Downloads

Published

13-01-2025

Issue

Section

Research Articles