Bridging the Gap: OCR Techniques for Noisy and Distorted Texts
DOI:
https://doi.org/10.32628/CSEIT2511111Keywords:
Optical Character Recognition (OCR), Deep Learning, Convolutional Neural Networks (CNNs), Transformer Models, Vision Transformers (ViTs), Noisy Text Recognition, Historical Document Digitization, Scene Text Recognition, Multilingual OCR, Preprocessing Techniques, Post-Processing in OCR, Attention Mechanisms, Real-Time OCR Applications, Augmented Reality Integration, Explainable AIAbstract
Optical Character Recognition (OCR) has evolved significantly over the years, enabling automated text extraction from a variety of sources. However, OCR systems often struggle with noisy and distorted texts, such as those found in low-quality scans, degraded historical documents, or images captured in challenging conditions. This paper explores state-of-the-art techniques and advancements in OCR for handling noisy and distorted texts. We discuss preprocessing methods, robust feature extraction, deep learning models, and post-processing techniques, providing a comprehensive overview of the field. Additionally, we analyse gaps in current research and propose future directions for developing more resilient OCR systems.
Downloads
References
Memon, J., Sami, M., & Khan, R. A. (2020). Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review. Journal of Imaging Science and Technology, (20–21).
Olejniczak, K., & Šulc, M. (2022). Text Detection Forgot About Document OCR. arXiv preprint arXiv:2210.07903,(14–15).
Mishra, A., Ram, A. S., & Kavyashree, C. (2023). Handwritten Text Recognition Using Convolutional Neural Network. arXiv preprint arXiv:2307.05396, (10-11).
Bamotra, A., & Uppala, P. K. (2023). TransDocs: Optical Character Recognition with Word-to-Word Translation. arXiv preprint arXiv:2304.07637, (18-20).
Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, (8-9).
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press, (12-13).
Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, (7-8).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (30-40).
Vaswani, A.,Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, &Illia Polosukhin. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, (22-30).
Hochreiter, S., &Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, (15-20).
Graves, A.,Marcus Liwicki,SantiagoFernández,RomanBertolami,Horst Bunke, &Jürgen Schmidhuber. (2009). A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (13-22).
Zhang, Z., Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, &Alexander Smola. (2019). ResNeSt: Split-Attention Networks. arXiv preprint arXiv:2004.08955, (26-29).
Dosovitskiy, A., Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, &Neil Houlsby. (2020). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929, (17-35).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, (5,10).
Shi, B., Bai, X., & Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (10-12).
Baek, J., Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh,&Hwalsuk Lee. (2019). What Is Wrong with Scene Text Recognition Models? Dataset and Model Analysis. Proceedings of the IEEE International Conference on Computer Vision, (17-28).
Karatzas, D., Lluis Gomez-Bigorda,AnguelosNicolaou,SumanGhosh,AndrewBagdanov,&Masakazu Iwamura. (2015). ICDAR 2015 Competition on Robust Reading. Proceedings of the International Conference on Document Analysis and Recognition, (19-25).
Cheng, Z., Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, &Shuigeng Zhou. (2017). Focusing Attention: Towards Accurate Text Recognition in Natural Images. Proceedings of the IEEE International Conference on Computer Vision, (21-25).
Wang, T.,David J. Wu,Adam Coates,&Andrew Y. Ng. (2012). End-to-End Text Recognition with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Pattern Recognition, (18-25).
Smith, R. (2007). An Overview of the Tesseract OCR Engine. Proceedings of the International Conference on Document Analysis and Recognition, (25-30).
Liao, M., Baoguang Shi, &Xiang Bai. (2018). Textboxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, (25-31).
Zisserman, A. (2021). Scene Text Recognition with Transformer. arXiv preprint arXiv:2103.06495, (29-37).
Raghu, M.,Chiyuan Zhang, Jon Kleinberg, &Samy Bengio. (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. Neural Information Processing Systems, (16-20).
Kim, S. (2020). Meta-Learning for Robust Scene Text Recognition. arXiv preprint arXiv:2010.03478, (28-30).
Liu, X. (2021). Towards Explainable Scene Text Recognition. Proceedings of the IEEE International Conference on Computer Vision, (12-25).
Xu, Y. (2022). AR Text Recognition: Augmented Reality Applications for OCR. IEEE Augmented Reality and Human-Computer Interaction Conference, (32-40).
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.