Bridging the Gap: OCR Techniques for Noisy and Distorted Texts

Sanjay Kumar Gorai; Shekhar Pradhan

doi:10.32628/CSEIT2511111

Authors

Sanjay Kumar Gorai Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, India Author
Shekhar Pradhan Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, India Author

DOI:

https://doi.org/10.32628/CSEIT2511111

Keywords:

Optical Character Recognition (OCR), Deep Learning, Convolutional Neural Networks (CNNs), Transformer Models, Vision Transformers (ViTs), Noisy Text Recognition, Historical Document Digitization, Scene Text Recognition, Multilingual OCR, Preprocessing Techniques, Post-Processing in OCR, Attention Mechanisms, Real-Time OCR Applications, Augmented Reality Integration, Explainable AI

Abstract

Optical Character Recognition (OCR) has evolved significantly over the years, enabling automated text extraction from a variety of sources. However, OCR systems often struggle with noisy and distorted texts, such as those found in low-quality scans, degraded historical documents, or images captured in challenging conditions. This paper explores state-of-the-art techniques and advancements in OCR for handling noisy and distorted texts. We discuss preprocessing methods, robust feature extraction, deep learning models, and post-processing techniques, providing a comprehensive overview of the field. Additionally, we analyse gaps in current research and propose future directions for developing more resilient OCR systems.

📊 Article Downloads

References

Memon, J., Sami, M., & Khan, R. A. (2020). Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review. Journal of Imaging Science and Technology, (20–21). DOI: https://doi.org/10.1109/ACCESS.2020.3012542

Olejniczak, K., & Šulc, M. (2022). Text Detection Forgot About Document OCR. arXiv preprint arXiv:2210.07903,(14–15).

Mishra, A., Ram, A. S., & Kavyashree, C. (2023). Handwritten Text Recognition Using Convolutional Neural Network. arXiv preprint arXiv:2307.05396, (10-11).

Bamotra, A., & Uppala, P. K. (2023). TransDocs: Optical Character Recognition with Word-to-Word Translation. arXiv preprint arXiv:2304.07637, (18-20).

Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, (8-9). DOI: https://doi.org/10.1109/TSMC.1979.4310076

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press, (12-13).

Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, (7-8).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (30-40). DOI: https://doi.org/10.1109/CVPR.2016.90

Vaswani, A.,Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, &Illia Polosukhin. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, (22-30).

Hochreiter, S., &Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, (15-20). DOI: https://doi.org/10.1162/neco.1997.9.8.1735

Graves, A.,Marcus Liwicki,SantiagoFernández,RomanBertolami,Horst Bunke, &Jürgen Schmidhuber. (2009). A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (13-22). DOI: https://doi.org/10.1109/TPAMI.2008.137

Zhang, Z., Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, &Alexander Smola. (2019). ResNeSt: Split-Attention Networks. arXiv preprint arXiv:2004.08955, (26-29).

Dosovitskiy, A., Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, &Neil Houlsby. (2020). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929, (17-35).

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, (5,10). DOI: https://doi.org/10.1109/5.726791

Shi, B., Bai, X., & Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (10-12). DOI: https://doi.org/10.1109/TPAMI.2016.2646371

Baek, J., Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh,&Hwalsuk Lee. (2019). What Is Wrong with Scene Text Recognition Models? Dataset and Model Analysis. Proceedings of the IEEE International Conference on Computer Vision, (17-28). DOI: https://doi.org/10.1109/ICCV.2019.00481

Karatzas, D., Lluis Gomez-Bigorda,AnguelosNicolaou,SumanGhosh,AndrewBagdanov,&Masakazu Iwamura. (2015). ICDAR 2015 Competition on Robust Reading. Proceedings of the International Conference on Document Analysis and Recognition, (19-25). DOI: https://doi.org/10.1109/ICDAR.2015.7333942

Cheng, Z., Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, &Shuigeng Zhou. (2017). Focusing Attention: Towards Accurate Text Recognition in Natural Images. Proceedings of the IEEE International Conference on Computer Vision, (21-25). DOI: https://doi.org/10.1109/ICCV.2017.543

Wang, T.,David J. Wu,Adam Coates,&Andrew Y. Ng. (2012). End-to-End Text Recognition with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Pattern Recognition, (18-25).

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Proceedings of the International Conference on Document Analysis and Recognition, (25-30). DOI: https://doi.org/10.1109/ICDAR.2007.4376991

Liao, M., Baoguang Shi, &Xiang Bai. (2018). Textboxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, (25-31). DOI: https://doi.org/10.1109/TIP.2018.2825107

Zisserman, A. (2021). Scene Text Recognition with Transformer. arXiv preprint arXiv:2103.06495, (29-37).

Raghu, M.,Chiyuan Zhang, Jon Kleinberg, &Samy Bengio. (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. Neural Information Processing Systems, (16-20).

Kim, S. (2020). Meta-Learning for Robust Scene Text Recognition. arXiv preprint arXiv:2010.03478, (28-30).

Liu, X. (2021). Towards Explainable Scene Text Recognition. Proceedings of the IEEE International Conference on Computer Vision, (12-25).

Xu, Y. (2022). AR Text Recognition: Augmented Reality Applications for OCR. IEEE Augmented Reality and Human-Computer Interaction Conference, (32-40).

Bridging the Gap: OCR Techniques for Noisy and Distorted Texts

Authors

DOI:

Keywords:

Abstract

📊 Article Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

IssueDate

RightSideBlock

Latest publications