Balochi Text Segmentation for Establishing Balochi OCR

Authors

  • Muhammad Mazhar Department of Computer Science and Technology, Faculty of Information Science and technology, Ocean University of China Author
  • Qinbo Department of Computer Science and Technology, Faculty of Information Science and technology, Ocean University of China Author
  • Dil Nawaz Hakro Faculty of Engineering and Technology (FET) University of Sindh, Jamshoro, Pakistan Author
  • Mashooque Ali Mahar Institute of Computer Sciences, Shah Abdul Latif University Khairpur Mirs, Pakistan Author
  • Abdul Majid Department of Computer Science and Technology, Faculty of Information Science and technology, Ocean University of China Author
  • Basit Ali Department of Computer Science and Technology, Faculty of Information Science and technology, Ocean University of China Author

DOI:

https://doi.org/10.32628/CSEIT2410461

Keywords:

Handwritten Text, Text Recognition, Balochi Language, OCR

Abstract

OCR is considered the fastest way of data entry; the smart conversion of the text data is called handwritten text recognition. Many of the languages possess OCRs and there are still some languages lacking the OCR. Balochi is one of the national languages of the Pakistan country and the most of speakers live in Baluchistan province of Pakistan. Balochi computing is at its infancy and require attention to its many of the approaches to accumulate the level of other languages especially pertaining to the matter of computation. This paper investigates the relation between other Arabic adopting languages and proposes a segmentation algorithm to segment Balochi text paragraphs into lines, lines into words and words into characters. The algorithm has been adopted and fine tuned to produce the accuracy of 95%. The segmentation algorithm will play a role in developing a complex OCR and handwritten recognition of Balochi language.

Downloads

Download data is not yet available.

References

Abdul Majid, Qinbo, Dil Nawaz Hakro, Saba Brahmani (2024). Thinning Chinese, Korean, Japanese and Thai script for segmentation-free OCRs. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. January- February-2024, 10 (1) : 116-121. Available Online at : www.ijsrcseit.com doi : https://doi.org/10.32628/CSEIT2410111 DOI: https://doi.org/10.32628/CSEIT2410111

D. N. Hakro, S. A. Awan Z.A. Bhutto, M. Memon, M. Hameed (2017), ‘Handling ambiguities in Sindhi Entity Recognition (SNER), Sindh University Research Journal (Science Series) 49 (3), 513-516. DOI: https://doi.org/10.26692/surj/2017.09.08

Shang, L. and Yi, Z, “A class of binary images thinning using two PCNNs”, Neurocomputing, Volume: 70, pages: 1096-1101, (2007). DOI: https://doi.org/10.1016/j.neucom.2006.08.006

D. N. Hakro, Z. Talib. G. N. Mojai. (2015), ‘Multilingual Text Image Database for OCR ', Sindh University Research Journal (Science Series) 47(1), 181-186.

Fida Hussain Khoso, Dil Nawaz Hakro, Syed Zafar Nasir, (2021),‘ Challenges of Accent and vowels for Sindhi Speech Recognition System', International Journal of Advanced Trends in Computer Science and Engineering, Vol. 10, No. 2, 916 - 921, April 2021, ISSN 2278-3091, https://doi.org/10.30534/ijatcse/2021/621022021 DOI: https://doi.org/10.30534/ijatcse/2021/621022021

Mosbah, L., Moalla, I., Hamdani, T. M., Neji, B., Beyrouthy, T., & Alimi, A. M. (2024). ADOCRNet: A Deep Learning OCR for Arabic Documents Recognition. IEEE Access. DOI: https://doi.org/10.1109/ACCESS.2024.3379530

A.A. Chandio, M. Leghari, D. N. Hakro, S. Awan, A.H. Jalbani. (2016), ‘A Novel Approach for online Sindhi Handwritten Word Recognition Using Neural Network', Sindh University Research Journal (Science Series) 48(1), 213-216.

D. N. Hakro, I., A. Z. Talib. Z. Bhatti. G. N. Mojai. (2014), A Study of Sindhi Related and Arabic Script Adapted Languages Recognition', Sindh University Research Journal (Science Series) 46(3), 323-334.

S. A. Awan, D. N. Hakro, Z. H.Abro, A. H. Jalbani (2018), ‘Segmentation of Sindhi Handwritten Text', Sindh University Research Journal (Science Series) 50 (2), 205-208.

T. M. Alcorn and C. W. Hoggar, "Pre-processing of data for character recognition," Marconi Rev., vol. 32, pp. 61-81, 1969.

Rafael C. Gonzalez, Richard E. Woods and Steven L. Eddins, “Digital Image Processing using MATLAB”, Third Indian Reprint , 2005,pp. 370- 375.

Dil Nawaz Hakro, Abdul Majid, Muhammad Nadeem, Mashooq Ali Mahar, Qinbo, Dilawar Khan, Saba Brahmani (2024). Thinning of Balti Script: Way Forward to Balti OCR, International Journal of Scientific Research in Computer Science, Engineering and Information Technology. Sep-Oct-2024, 10(5): 789-794. Available Online at: www.ijsrcseit.com, doi: https://doi.org/10.32628/CSEIT2410428 DOI: https://doi.org/10.32628/CSEIT2410428

Dil Nawaz Hakro and Abdullah Zawawi Talib. 2016. Printed text image database for Sindhi OCR. ACM Tran. Asian Low-Resource Language information processing. 15, 4, Article 21 (May 2016), 18 pages. DOI: http://dx.doi.org/10.1145/2846093 DOI: https://doi.org/10.1145/2846093

Abdul Majid, Qinbo, Dil Nawaz Hakro, Muhammad Owais Khan (2023). Generalized Segmentation Algorithm for Dissimilar Script Languages. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. November- December-2023, 9 (6) : 303-309. Available Online at : www.ijsrcseit.com doi : https://doi.org/10.32628/CSEIT2390657 DOI: https://doi.org/10.32628/CSEIT2390657

Irum Naz Sodhar, Akhtar Hussain Jalbani, Muhammad Ibrahim Channa, and Dil Nawaz Hakro. "Parts of Speech Tagging of Romanized Sindhi Text by applying Rule Based Model." International Journal of Computer science and Network Security (IJCSNS) 19, no. 11 (2019): 91-96.

D. N. Hakro, I. A. Ismaili, A. Z. Talib. Z. Bhatti. G. N. Mojai. (2014), 'Issues and Challenges in Sindhi OCR', Sindh University Research Journal (Science Series) 46(2), 143-152.

Hamza, A., Ren, S., & Saeed, U. (2024). ET-Network: A novel efficient transformer deep learning model for automated Urdu handwritten text recognition. Plos one, 19(5), e0302590. DOI: https://doi.org/10.1371/journal.pone.0302590

Afkari-Fahandari, A., Asadi-Zeydabadi, F., Shabaninia, E., & Nezamabadi-Pour, H. (2024, February). Enhancing Farsi Text Recognition via Iteratively Using a Language Model. In 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/AISP61396.2024.10475269

Yaseen, B., & Hassani, H. (2024). Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines. arXiv preprint arXiv:2404.06101.

Xu, M., Zhang, J., Xu, L., Li, Y., & Silamu, W. (2024, October). Dual Feature Enhanced Scene Text Recognition Method for Low-Resource Uyghur. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (pp. 58-71). Singapore: Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-97-8511-7_5

Muhammad Mazhar, Qinbo, Dil Nawaz Hakro, Abdul Majid (2024). Optical Character Recognition of Balochi Script, International Journal of Scientific Research in Computer Science, Engineering and Information Technology. July-August-2024, 10 (4) : 115-124. Available Online at : www.ijsrcseit.com , doi : https://doi.org/10.32628/CSEIT241046 DOI: https://doi.org/10.32628/CSEIT241046

Abida Shar, Asia Khatoon Soomro, Hira Fatima Naqvi, Fida Hussain Khoso, Maryam Hameed, Dil Nawaz Hakro, (2021), ‘Balochi Speech Recognition using Android Based Smart Phone', University of Sindh Journal of Information and Communication Technology (USJICT) 5 (4), 208-212. ISSN-E: 2523-1235, ISSN-P: 2521-5582.

Downloads

Published

28-11-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 120

You may also start an advanced similarity search for this article.