Thinning of Balti Script : Way Forward to Balti OCR

Authors

  • Dil Nawaz Hakro Department of Software Engineering, FET, University of Sindh, Jamshoro, Pakistan Author
  • Abdul Majid Department of Computer Science and Technology, Faculty of Information Science and Engineering, Ocean University of China Author
  • Muhammad Nadeem Department of Computer Science and Technology, Faculty of Information Science and Engineering, Ocean University of China Author
  • Mashooq Ali Mahar Institute of Computer Science and Technology, Shah Abdul Latif University Khairpur Mirs Author
  • Qinbo Department of Computer Science and Technology, Faculty of Information Science and Engineering, Ocean University of China Author
  • Dilawar Khan Department of Computer Science and Technology, Faculty of Information Science and Engineering, Ocean University of China Author
  • Saba Brahmani Department of Computer Science and Technology, Faculty of Information Science and Engineering, Ocean University of China Author

DOI:

https://doi.org/10.32628/CSEIT2410428

Keywords:

Balti Language, Balti script, OCR, ICR, Thinning, Feature extraction

Abstract

Natural language is one of the applications of Artificial Intelligence, which trains machines to do the jobs in human language. OCR is one of the fields where the writing efforts are omitted and text images are converted into editable text. An OCR may have post and preprocessing to enhance the text image more suitable for the rest of the OCR process. Thinning is the preprocessing approach in which the characters, words and text is thinned to its one-pixel skeleton. Much of the work has been done in the various languages of the world as well as Pakistani languages. The work on Balti OCR is nonexistent. In this study, a thinning algorithm is proposed for the Balti language, a language spoken in the northern areas of Pakistan and India. Many of the Balti images were tested with the proposed algorithm and the proposed system produced accurate results by giving a one pixel skeleton of input image. The proposed algorithm tested with hundreds of Balti language images and selected results are presented in this paper. The current research has many directions including the way forward to building Balti OCR, Balti ICR (both segmentation based and segmentation free).

Downloads

Download data is not yet available.

References

T. Qureshi, D. N. Hakro, F.H. Khoso, K.U.R. Khoumbati, R. Takharwal, M. Hameed, G.N. RAJPER (2018), ‘Issues and Challenges in Sindhi Speech Recognition Systems', Sindh University Research Journal (Science Series) 50 (4), 595-600. DOI: https://doi.org/10.26692/sujo/2018.12.0096

D. N. Hakro, S. A. Awan Z.A. Bhutto, M. Memon, M. Hameed (2017), ‘Handling ambiguities in Sindhi Entity Recognition (SNER), Sindh University Research Journal (Science Series) 49 (3), 513-516. DOI: https://doi.org/10.26692/surj/2017.09.08

Fida Hussain Khoso, Dil Nawaz Hakro, Syed Zafar Nasir, (2021),‘ Challenges of Accent and vowels for Sindhi Speech Recognition System', International Journal of Advanced Trends in Computer Science and Engineering, Vol. 10, No. 2, 916 - 921, April 2021, ISSN 2278-3091, https://doi.org/10.30534/ijatcse/2021/621022021 DOI: https://doi.org/10.30534/ijatcse/2021/621022021

Saba Rani, Hira Fatima Naqvi, Fida Hussain Khoso, Attia Agha, Dil Nawaz Hakro, Maryam Hameed, (2022), ‘Named Entity Recognition for Urdu Language: The UNER System, A Hybrid Approach', University of Sindh Journal of Information and Communication Technology (USJICT) 6 (3), pp:108-114, ISSN-E: 2523-1235, ISSN-P: 2521-5582.

D. N. Hakro, I. A. Ismaili, A. Z. Talib. Z. Bhatti. G. N. Mojai. (2014), 'Issues and Challenges in Sindhi OCR', Sindh University Research Journal (Science Series) 46(2), 143-152.

Dil Nawaz Hakro and Abdullah Zawawi Talib. 2016. Printed text image database for Sindhi OCR. ACM Tran. Asian Low-Resource Language information processing. 15, 4, Article 21 (May 2016), 18 pages. DOI: http://dx.doi.org/10.1145/2846093 DOI: https://doi.org/10.1145/2846093

A.A. Chandio, M. Leghari, D. N. Hakro, S. Awan, A.H. Jalbani. (2016), ‘A Novel Approach for online Sindhi Handwritten Word Recognition Using Neural Network', Sindh University Research Journal (Science Series) 48(1), 213-216.

D. N. Hakro, I., A. Z. Talib. Z. Bhatti. G. N. Mojai. (2014), A Study of Sindhi Related and Arabic Script Adapted Languages Recognition', Sindh University Research Journal (Science Series) 46(3), 323-334.

S. A. Awan, D. N. Hakro, Z. H.Abro, A. H. Jalbani (2018), ‘Segmentation of Sindhi Handwritten Text', Sindh University Research Journal (Science Series) 50 (2), 205-208.

S. A. Awan, D. N. Hakro, Z. H.Abro, A. H. Jalbani (2018), ‘Segmentation of Sindhi Handwritten Text', Sindh University Research Journal (Science Series) 50 (2), 205-208.

Rafael C. Gonzalez, Richard E. Woods and Steven L. Eddins, “Digital Image Processing using MATLAB”, Third Indian Reprint , 2005,pp. 370- 375.

Lam L., Lee, S. W., Suen, S. Y., “Thinning methodologies - A comprehensive survey, “IEEE Trans. Pattern Analysis and Machine Intelligence, pp. 869-885, 1992. DOI: https://doi.org/10.1109/34.161346

Zhou, R. W. Quek C. and G. S. Ng, “A novel single-pass thinning algorithm and an effective set of performance criteria”, Pattern Recognition letters, Volume: 16, pages: 1267 – 1275, (1995). DOI: https://doi.org/10.1016/0167-8655(95)00078-X

Jagna A. and Kamakshiprasad V., “New Parallel Binary Image Thinning Algorithm”, Asian Research Publishing Network (ARPN). Journal of Engineering and Applied Sciences, Volume: 5, No. 4, April 2010. ISSN: 1819-6608.

Holt. C. M, Stewart. A., Clint. M, and Perrott. R. H., “An improved parallel thinning algorithm”, Commun. ACM, Volume: 30(2), pages: 156 - 160, (1987). DOI: https://doi.org/10.1145/12527.12531

Cowell J. and Fiaz H., “Thinning Arabic character feature extraction “, IEEE Transactions on Pattern Analysis Machine Intelligence, Vol: 14, No.11, pp: 869-885, 1992. DOI: https://doi.org/10.1109/34.161346

Zhang T. Y. and Suen, C. Y. “A fast Parallel Algorithms for Thinning Digital Patterns”, Research Contributions, Communications of the ACM. 27 (3): 236-239, 1984. DOI: https://doi.org/10.1145/357994.358023

Shang, L. and Yi, Z, “A class of binary images thinning using two PCNNs”, Neurocomputing, Volume: 70, pages: 1096-1101, (2007). DOI: https://doi.org/10.1016/j.neucom.2006.08.006

Chiu, H.P. & Tseng, D.C., “A feature-preserved thinning algorithm for handwritten Chinese characters”, Signal Processing, Volume: 58(2), pages: 203 – 214, (1997). DOI: https://doi.org/10.1016/S0165-1684(97)00024-8

D. N. Hakro, Z. Talib. G. N. Mojai. (2015), ‘Multilingual Text Image Database for OCR ', Sindh University Research Journal (Science Series) 47(1), 181-186.

Census of India, 1961: Jammu and Kashmir. Registrar General and Census Commissioner of India. 1961. p. 357.

Sprigg, R. K. (1966). "Lepcha and Balti Tibetan: Tonal or Non-Tonal Languages?". Asia Major. 12: 185–201.

Shams, Shammim Ara (2020). "The Impact of Dominant Languages on Regional Languages: A Case Study of English, Urdu and Shina". Pakistan Social Sciences Review. 4 (III): 1092–1106. doi:10.35484/pssr.2020(4-III)79. DOI: https://doi.org/10.35484/pssr.2020(4-III)79

T. M. Alcorn and C. W. Hoggar, "Pre-processing of data for character recognition," Marconi Rev., vol. 32, pp. 61-81, 1969.

Rama Shahani, Asia Khatoon Soomro, Asadullah Burdi, Fida Hussain Khoso, Dil Nawaz Hakro, Maryam Hameed, (2022), ‘Controlling Android Based Smart Phone in Sindhi', University of Sindh Journal of Information and Communication Technology (USJICT) 6 (2), pp: 77-81, ISSN-E: 2523-1235, ISSN-P: 2521-5582. Website: http://sujo.usindh.edu.pk/index.php/USJICT/.

D. N. Hakro, S. A. Awan, M. Memon, A.M. Aamur, G. N. Mojai (2015), ‘Interactive Thinning for Segmentation-based and Segmentation-free Sindhi OCR ', Sindh University Research Journal (Science Series) 47(3), 395-398.

Muhammad Mazhar, Qinbo, Dil Nawaz Hakro, Abdul Majid (2024). Optical Character Recognition of Balochi Script, International Journal of Scientific Research in Computer Science, Engineering and Information Technology. July-August-2024, 10 (4) : 115-124. Available Online at : www.ijsrcseit.com , doi : https://doi.org/10.32628/CSEIT241046 DOI: https://doi.org/10.32628/CSEIT241046

Abdul Majid, Qinbo, Dil Nawaz Hakro, Muhammad Owais Khan (2023). Generalized Segmentation Algorithm for Dissimilar Script Languages. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. November-December-2023, 9 (6) : 303-309. Available Online at : www.ijsrcseit.com doi : https://doi.org/10.32628/CSEIT2390657 DOI: https://doi.org/10.32628/CSEIT2390657

Abdul Majid, Qinbo, Dil Nawaz Hakro, Saba Brahmani (2024). Thinning Chinese, Korean, Japanese and Thai script for segmentation-free OCRs. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. January-February-2024, 10 (1) : 116-121. Available Online at : www.ijsrcseit.com doi : https://doi.org/10.32628/CSEIT2410111 DOI: https://doi.org/10.32628/CSEIT2410111

Downloads

Published

01-11-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 113

You may also start an advanced similarity search for this article.