Thinning of Balti Script : Way Forward to Balti OCR
DOI:
https://doi.org/10.32628/CSEIT2410428Keywords:
Balti Language, Balti script, OCR, ICR, Thinning, Feature extractionAbstract
Natural language is one of the applications of Artificial Intelligence, which trains machines to do the jobs in human language. OCR is one of the fields where the writing efforts are omitted and text images are converted into editable text. An OCR may have post and preprocessing to enhance the text image more suitable for the rest of the OCR process. Thinning is the preprocessing approach in which the characters, words and text is thinned to its one-pixel skeleton. Much of the work has been done in the various languages of the world as well as Pakistani languages. The work on Balti OCR is nonexistent. In this study, a thinning algorithm is proposed for the Balti language, a language spoken in the northern areas of Pakistan and India. Many of the Balti images were tested with the proposed algorithm and the proposed system produced accurate results by giving a one pixel skeleton of input image. The proposed algorithm tested with hundreds of Balti language images and selected results are presented in this paper. The current research has many directions including the way forward to building Balti OCR, Balti ICR (both segmentation based and segmentation free).
Downloads
References
T. Qureshi, D. N. Hakro, F.H. Khoso, K.U.R. Khoumbati, R. Takharwal, M. Hameed, G.N. RAJPER (2018), ‘Issues and Challenges in Sindhi Speech Recognition Systems', Sindh University Research Journal (Science Series) 50 (4), 595-600. DOI: https://doi.org/10.26692/sujo/2018.12.0096
D. N. Hakro, S. A. Awan Z.A. Bhutto, M. Memon, M. Hameed (2017), ‘Handling ambiguities in Sindhi Entity Recognition (SNER), Sindh University Research Journal (Science Series) 49 (3), 513-516. DOI: https://doi.org/10.26692/surj/2017.09.08
Fida Hussain Khoso, Dil Nawaz Hakro, Syed Zafar Nasir, (2021),‘ Challenges of Accent and vowels for Sindhi Speech Recognition System', International Journal of Advanced Trends in Computer Science and Engineering, Vol. 10, No. 2, 916 - 921, April 2021, ISSN 2278-3091, https://doi.org/10.30534/ijatcse/2021/621022021 DOI: https://doi.org/10.30534/ijatcse/2021/621022021
Saba Rani, Hira Fatima Naqvi, Fida Hussain Khoso, Attia Agha, Dil Nawaz Hakro, Maryam Hameed, (2022), ‘Named Entity Recognition for Urdu Language: The UNER System, A Hybrid Approach', University of Sindh Journal of Information and Communication Technology (USJICT) 6 (3), pp:108-114, ISSN-E: 2523-1235, ISSN-P: 2521-5582.
D. N. Hakro, I. A. Ismaili, A. Z. Talib. Z. Bhatti. G. N. Mojai. (2014), 'Issues and Challenges in Sindhi OCR', Sindh University Research Journal (Science Series) 46(2), 143-152.
Dil Nawaz Hakro and Abdullah Zawawi Talib. 2016. Printed text image database for Sindhi OCR. ACM Tran. Asian Low-Resource Language information processing. 15, 4, Article 21 (May 2016), 18 pages. DOI: http://dx.doi.org/10.1145/2846093 DOI: https://doi.org/10.1145/2846093
A.A. Chandio, M. Leghari, D. N. Hakro, S. Awan, A.H. Jalbani. (2016), ‘A Novel Approach for online Sindhi Handwritten Word Recognition Using Neural Network', Sindh University Research Journal (Science Series) 48(1), 213-216.
D. N. Hakro, I., A. Z. Talib. Z. Bhatti. G. N. Mojai. (2014), A Study of Sindhi Related and Arabic Script Adapted Languages Recognition', Sindh University Research Journal (Science Series) 46(3), 323-334.
S. A. Awan, D. N. Hakro, Z. H.Abro, A. H. Jalbani (2018), ‘Segmentation of Sindhi Handwritten Text', Sindh University Research Journal (Science Series) 50 (2), 205-208.
S. A. Awan, D. N. Hakro, Z. H.Abro, A. H. Jalbani (2018), ‘Segmentation of Sindhi Handwritten Text', Sindh University Research Journal (Science Series) 50 (2), 205-208.
Rafael C. Gonzalez, Richard E. Woods and Steven L. Eddins, “Digital Image Processing using MATLAB”, Third Indian Reprint , 2005,pp. 370- 375.
Lam L., Lee, S. W., Suen, S. Y., “Thinning methodologies - A comprehensive survey, “IEEE Trans. Pattern Analysis and Machine Intelligence, pp. 869-885, 1992. DOI: https://doi.org/10.1109/34.161346
Zhou, R. W. Quek C. and G. S. Ng, “A novel single-pass thinning algorithm and an effective set of performance criteria”, Pattern Recognition letters, Volume: 16, pages: 1267 – 1275, (1995). DOI: https://doi.org/10.1016/0167-8655(95)00078-X
Jagna A. and Kamakshiprasad V., “New Parallel Binary Image Thinning Algorithm”, Asian Research Publishing Network (ARPN). Journal of Engineering and Applied Sciences, Volume: 5, No. 4, April 2010. ISSN: 1819-6608.
Holt. C. M, Stewart. A., Clint. M, and Perrott. R. H., “An improved parallel thinning algorithm”, Commun. ACM, Volume: 30(2), pages: 156 - 160, (1987). DOI: https://doi.org/10.1145/12527.12531
Cowell J. and Fiaz H., “Thinning Arabic character feature extraction “, IEEE Transactions on Pattern Analysis Machine Intelligence, Vol: 14, No.11, pp: 869-885, 1992. DOI: https://doi.org/10.1109/34.161346
Zhang T. Y. and Suen, C. Y. “A fast Parallel Algorithms for Thinning Digital Patterns”, Research Contributions, Communications of the ACM. 27 (3): 236-239, 1984. DOI: https://doi.org/10.1145/357994.358023
Shang, L. and Yi, Z, “A class of binary images thinning using two PCNNs”, Neurocomputing, Volume: 70, pages: 1096-1101, (2007). DOI: https://doi.org/10.1016/j.neucom.2006.08.006
Chiu, H.P. & Tseng, D.C., “A feature-preserved thinning algorithm for handwritten Chinese characters”, Signal Processing, Volume: 58(2), pages: 203 – 214, (1997). DOI: https://doi.org/10.1016/S0165-1684(97)00024-8
D. N. Hakro, Z. Talib. G. N. Mojai. (2015), ‘Multilingual Text Image Database for OCR ', Sindh University Research Journal (Science Series) 47(1), 181-186.
Census of India, 1961: Jammu and Kashmir. Registrar General and Census Commissioner of India. 1961. p. 357.
Sprigg, R. K. (1966). "Lepcha and Balti Tibetan: Tonal or Non-Tonal Languages?". Asia Major. 12: 185–201.
Shams, Shammim Ara (2020). "The Impact of Dominant Languages on Regional Languages: A Case Study of English, Urdu and Shina". Pakistan Social Sciences Review. 4 (III): 1092–1106. doi:10.35484/pssr.2020(4-III)79. DOI: https://doi.org/10.35484/pssr.2020(4-III)79
T. M. Alcorn and C. W. Hoggar, "Pre-processing of data for character recognition," Marconi Rev., vol. 32, pp. 61-81, 1969.
Rama Shahani, Asia Khatoon Soomro, Asadullah Burdi, Fida Hussain Khoso, Dil Nawaz Hakro, Maryam Hameed, (2022), ‘Controlling Android Based Smart Phone in Sindhi', University of Sindh Journal of Information and Communication Technology (USJICT) 6 (2), pp: 77-81, ISSN-E: 2523-1235, ISSN-P: 2521-5582. Website: http://sujo.usindh.edu.pk/index.php/USJICT/.
D. N. Hakro, S. A. Awan, M. Memon, A.M. Aamur, G. N. Mojai (2015), ‘Interactive Thinning for Segmentation-based and Segmentation-free Sindhi OCR ', Sindh University Research Journal (Science Series) 47(3), 395-398.
Muhammad Mazhar, Qinbo, Dil Nawaz Hakro, Abdul Majid (2024). Optical Character Recognition of Balochi Script, International Journal of Scientific Research in Computer Science, Engineering and Information Technology. July-August-2024, 10 (4) : 115-124. Available Online at : www.ijsrcseit.com , doi : https://doi.org/10.32628/CSEIT241046 DOI: https://doi.org/10.32628/CSEIT241046
Abdul Majid, Qinbo, Dil Nawaz Hakro, Muhammad Owais Khan (2023). Generalized Segmentation Algorithm for Dissimilar Script Languages. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. November-December-2023, 9 (6) : 303-309. Available Online at : www.ijsrcseit.com doi : https://doi.org/10.32628/CSEIT2390657 DOI: https://doi.org/10.32628/CSEIT2390657
Abdul Majid, Qinbo, Dil Nawaz Hakro, Saba Brahmani (2024). Thinning Chinese, Korean, Japanese and Thai script for segmentation-free OCRs. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. January-February-2024, 10 (1) : 116-121. Available Online at : www.ijsrcseit.com doi : https://doi.org/10.32628/CSEIT2410111 DOI: https://doi.org/10.32628/CSEIT2410111
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Computer Science, Engineering and Information Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.