Zone-Wise Segmentation and Lexicon-Driven Recognition for Printed Myanmar Characters

Authors(2) :-Chit San Lwin, Xiangqian Wu

This paper presents a new segmentation and recognition algorithms for Myanmar script inputted from offline printed images. Zone segmentation considers horizontal and vertical zones; it is applied to segment letters according to their roles such as primary or peripheral characters. In doing so, statistical and structural features of segmented characters are explored and exploited in recognition process. Hidden Markov model is used for recognition of primary characters while Kohonen self-organization map is used for peripheral characters. The recognized characters by each model are then combined, and finally are recognized by k-nearest neighbors algorithm with the help of lexicon is composed of all common Myanmar characters. Our OCR system for Myanmar characters tested on a dataset that approximately contains 7560 compounded characters. From the results, our system achieves higher significant results both segmentation and recognition compared to the other contemporary Myanmar OCR’s approaches.

Authors and Affiliations

Chit San Lwin
Department of Mathematics, Monywa University, Monywa City, Sagaing Region, Myanmar
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P. R. China
Xiangqian Wu
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P. R. China

Character Segmentation, Hidden Markov Model, Self-organization Map, k-nearest Neighbors, Lexicon

  1. P Sahare and S. B. Dhok, "Multilingual character segmentation and recognition schemes for Indian document images," Digital Object Identifier, Vol. 6, IEEE Access, 2018, pp. 10603-10617.
  2. P P. Roy, A. K. Bhunia, A. Das, P. Dey and U. Pal, "HMM-based Indic handwritten word recognition using zone segmentation," Pattern Recognition, Vol. 60, 2016, pp. 1057-1075.
  3. D Tao, L. Liang, L. Jin and Y. Gao, "Similar handwritten Chinese character recognition by kernel discriminative locality alignment," Pattern Recognition Letters, Vol. 35, 2014, pp.186-194.
  4. X Xiao, L. Jin, Y. Yang, W. Yang, J. Sun and T. Chang, "Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition," Pattern Recognition, Vol. 72, 2017, pp. 72-81.
  5. R D. Zarro and M. A. Anwer, "Recognition-based online Kurdish character recognition using hidden Markov model and harmony search," Engineering Science and Technology, an International Journal, Vol. 20, 2017, pp. 783-794.
  6. A Bharath and S. Madhvanath, "HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic script," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 4, 2012, pp. 670-682.
  7. J J. Weinman, E. Learned-Miller and A. R. Hanson, "Scene text recognition using similarity and a lexicon with sparse belief propagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 10, 2009, pp. 1733-1746.
  8. H L. Premaratne, E. Jarpe and J. Bigun, "Lexicon and hidden Markov model-based optimization of the recognized Sinhala script," Pattern Recognition Letters, Vol. 27, 2006, pp. 696-705.
  9. "Burmese language" or "Myanmar language," https://en.wikipedia.org/wiki/Burmee_language, January 2018.
  10. H. Malik and M. A. Fahiem, "Segmentation of printed Urdu scripts using structural features," Second International Conference in Visualization, IEEE, 2009, pp. 191-195.
  11. R. Pramanik and S. Bag, "Shape decomposition-based handwritten compound character recognition for Bangla OCR," Journal of Visual Communication and Image Representation, Vol. 50, 2018, pp. 123-134.
  12. M. Akram and S. Hussain, "Word segmentation for Urdu OCR system," Proceedings of the 8th Workshop on Asian Language Resources, Asian Federation for Natural Language Processing, Beijing, China, 2010, pp. 87-93.
  13. J. H. AIKhateeb, J. Ren, J. Jiang and H. AI-Muhtaseb, "Of?ine handwritten Arabic cursive text recognition using hidden Markov models and re-ranking," Pattern Recognition Letters, Vol. 32, 2011, pp. 1081-1088.
  14. D. B. Megherbi, S. M. Lodhi and A. J. Boulenouar, "Fuzzy logic model-based technique with application to Urdu characters recognition," Proceedings of SPIE, Vol. 3962, 2000, pp. 13-24.
  15. S. A. Sattar, S. Haque, M. K. Pathan and Q. Gee, "Implementation challenges for Nastaliq character recognition," Wireless Networks, Information Processing and Systems, Communications in Computer and Information Science (CCIS), Vol. 20, Springer-Verlag Berlin Heidelberg, 2008, pp. 279-285.
  16. S. A. Sattar, S-ul Haque and M. K. Pathan, "A finite state model for Urdu Nastalique optical character recognition," International Journal of Computer Science and Network Security, Vol. 9, No. 9, 2009, pp.116-122.
  17. U. Pal and A. Sarkar, "Recognition of printed Urdu script," 7th International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2003, pp. 1183-1187.
  18. S. M. Lodhi and M. A. Matin, "Urdu character recognition using fourier descriptors for optical networks," Proc. SPIE, Photonic Devices and Algorithms for Computing VII, Vol. 5907, 2005, pp. 59070O-1-59070O13.
  19. S. Zaman, W. Slany and F. Sahito, "Recognition of segmented Arabic/Urdu characters using pixel values as their features," 1st International Conference on Computer and Information Technology (ICCIT), 2012, pp. 507-512.
  20. S. Nomura, K. Yamanaka, O. Katai, H. Kawakami and T. Shiose, "A novel adaptive morphological approach for degraded character image segmentation," Pattern Recognition, Vol. 38, 2005, pp. 1961-1975.
  21. U. Garain and B. B. Chaudhuri, "Segmentation of touching characters in printed Devanagari and Bangla scrips using fuzzy multifactorial analysis," IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, Vol. 32, No. 4, 2002, pp. 449-459.
  22. S. A. B. Haji, A. James and Dr. S. Chandran, "A novel segmentation and skew correction approach for handwritten Malayalam documents," Procedia Technology, Vol. 24, 2016, pp. 1341-1348.
  23. Z. Sune, "Zhang-Suen thinning algorithm," https://rosettacode.org/wiki/Zhang-Suen_thinning_algorithm, January 2018.
  24. "The Thinning Algorithm," University of Oxford, https://users.fmrib.ox.ac.uk/~steve/susan/thinning/node2.html, January 2018.
  25. S. Naz, K. Hayat, M. I. Razzak, M. W. Anwar, S. A. Madani and S. U. Khan, "The optical character recognition of Urdu-like cursive scripts," Pattern Recognition, Vol. 47, 2014, pp. 1229-1248.
  26. Y. Bai, L. Guo, L. Jin and Q. Huang, "A novel feature extraction method using Pyramid histogram of orientation gradients for smile recognition," 16th International Conference on Image Processing (ICIP), IEEE, 2009, pp. 3305-3308.
  27. H. A. AI-Muhtaseb, S. A. Mahmoud and R. S. Qahwaji, "Recognition of off-line printed Arabic text using hidden Markov models," Signal Processing, Vol. 88, 2008, pp. 2902-2912.

Publication Details

Published in : Volume 3 | Issue 8 | November-December 2018
Date of Publication : 2018-11-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 161-180
Manuscript Number : CSEIT183844
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Chit San Lwin, Xiangqian Wu, "Zone-Wise Segmentation and Lexicon-Driven Recognition for Printed Myanmar Characters", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 8, pp.161-180, November-December-2018. Available at doi : https://doi.org/10.32628/CSEIT183844
Journal URL : https://res.ijsrcseit.com/CSEIT183844 Citation Detection and Elimination     |      |          | BibTeX | RIS | CSV

Article Preview