Review of Advances in Digital Recognition of Indic Scripts

Authors

  • Rajesh Kumar Maurya  University Department of Computer Science, University of Mumbai, Mumbai, India

Keywords:

OCR, Indic Scripts, Pattern Recognition. Character Recognition

Abstract

Digital content creation and document management in Indian languages are in progressing stage. OCR has become an administrative requirement for effective governance and daily activities. Scripts including those from medieval to contemporary time are of literary and political importance. The present research initiatives highlights the importance and needs of efforts in recognition of printed and handwritten documents written in languages of Indian origin. This paper is aims at reviewing the state of various scripts in use including those from medieval to present era and explores the prospective of digital recognition of handwritten and printed texts and thereby pointing towards futuristic trends in developing restoration software for Indic scripts. While OCRs for Indic scripts like Devanagari has attained good results and still improving the accuracy levels, many medieval and ancient scripts have very little attempts. Challenge is due to the number of languages and their diverse scripts. The scarcity of digitized linguistic resources makes the task a tougher one. The paper also highlights on the characteristics and challenges of recognition of scripts of Indic origin. Largely the digital recognition is limited to simple numerals and isolated characters. The paper enumerates the highest known performance of OCR attempts for important Indic scripts and suggests possibilities of using various approaches including statistical and soft computing for recognizing scripts of medieval times in particular.

References

  • Pal and B.B. Chaudhuri, "Printed Devnagari script OCR system", Knowledge Based Computer Systems : Research and Applications, Ed. K. S. R. Anjaneyulu, M. Sasikumar and S. Ramani, Narosa Publishing House, 1996, pp. 359-371
  • Veena Bansal, M. K. Sinha, "A Complete OCR for Printed Hindi Text in Devanagari Script", ICDAR, 2001, 2013 12th International Conference on Document Analysis and Recognition, 2013 12th International Conference on Document Analysis and Recognition 2001, pp. 800-804, doi:10.1109/ICDAR.2001.953898
  • Arora, S.; Bhattacharjee, D.; Nasipuri, M.; Basu, D.K.; Kundu, M., "Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition," Industrial and Information Systems, 2008. ICIIS 2008. IEEE Region 10 and the Third international Conference on , vol., no., pp.1,6, 8-10 Dec. 2008
    doi: 10.1109/ICIINFS.2008.4798415
  • Pal, U.; Sharma, N.; Wakabayashi, T.; Kimura, F., "Off-Line Handwritten Character Recognition of Devnagari Script," Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on , vol.1, no., pp.496,500, 23-26 Sept. 2007
    doi: 10.1109/ICDAR.2007.4378759
  • Mahmud, J.U.; Raihan, M.F.; Rahman, C.M., "A complete OCR system for continuous Bengali characters," TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region , vol.4, no., pp.1372,1376 Vol.4, 15-17 Oct. 2003
    doi: 10.1109/TENCON.2003.1273141
  • Mandal, S.; Sur, S.; Dan, A.; Bhowmick, P., "Handwritten Bangla character recognition in machine-printed forms using gradient information and Haar wavelet," Image Information Processing (ICIIP), 2011 International Conference on , vol., no., pp.1,6, 3-5 Nov. 2011
    doi: 10.1109/ICIIP.2011.6108911
  • K. Das, A. Datta, S. K. Parui,and B. B. Chaudhuri , Recognition Of Handprinted Bangla Numerals Using Neural Network Models, U. Bhattacharya, Advances in Soft Computing - AFSS 2002, Springer Verlag, Lecture Notes on Artificial Intelligence, Eds. N.R. Pal and M. Sugeno, LNAI 2275, 2002, pp. 228-235.
  • Bhattacharya, N.; Pal, U., "Stroke Segmentation and Recognition from Bangla Online Handwritten Text," Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on , vol., no., pp.740,745, 18-20 Sept. 2012
    doi: 10.1109/ICFHR.2012.275
  • Purkait, P.; Chanda, B., "Off-line Recognition of Hand-Written Bengali Numerals Using Morphological Features," Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on , vol., no., pp.363,368, 16-18 Nov. 2010
    doi: 10.1109/ICFHR.2010.63
  • Soman, Soumya T; Nandigam, Ashakranthi; Chakravarthy, V.Srinivasa, "An efficient multiclassifier system based on convolutional neural network for offline handwritten Telugu character recognition,"Communications (NCC), 2013 National Conference on , vol., no., pp.1,5, 15-17 Feb. 2013
    doi: 10.1109/NCC.2013.6488008
  • Jawahar, C.V.; Pavan Kumar, M.N.S.S.K.; Kiran, S S Ravi, "A bilingual OCR for Hindi-Telugu documents and its applications," Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on , vol., no., pp.408,412 vol.1, 3-6 Aug. 2003
    doi: 10.1109/ICDAR.2003.1227699
  • Shelke, S.; Apte, S., "A novel multistage classification and Wavelet based kernel generation for handwritten Marathi compound character recognition," Communications and Signal Processing (ICCSP), 2011 International Conference on , vol., no., pp.193,197, 10-12 Feb. 2011
    doi: 10.1109/ICCSP.2011.5739299
  • Urmila Shinde, Vanita Mane, Rajashree Shedge, “Marathi Character Recognition Using Ant Miner Algorithm”, International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-2, Issue-10, Oct.-2014, pp.101-107
  • Kiran R Dahake, S R Suralkar and S P Ramteke. Article: Optical Character Recognition for Marathi Text Newsprint. International Journal of Computer Applications 62(16):11-15, January 2013
  • M. Mali, “Moment And Density Based Hadwritten Marathi Numeral Recognition”, Indian Journal of Computer Science and Engineering (IJCSE), ISSN: 0976-5166 Vol. 3 No.5 Oct-Nov 2012, pp.707-712
  • A S Ramteke, G S Katkar, “Recognition of Off-line Modi Script : A Structure Similarity Approach”, International Journal of ICT and Management, February 2013 Vol- I Issue –I, ISSN No. 2026-6839, pp.12-15
  • G. AparnaA. G. Ramakrishnan, “A Complete Tamil Optical Character Recognition System”, Document Analysis Systems, Lecture Notes in Computer Science Volume 2423, 2002, Aug 2002, pp. 53-57
  • Suresh, R.M.; Ganesan, L., "Recognition of printed and handwritten Tamil characters using fuzzy approach," Computational Intelligence and Multimedia Applications, 2005. Sixth International Conference on , vol., no., pp.291,296, 16-18 Aug. 2005
    doi: 10.1109/ICCIMA.2005.47
  • A. Husain and S. H. Amin. “A multi-tier holistic approach for Urdu Nastaliq recognition”. In IEEE Int. Multi-topic Conference, Karachi, Pakistan, Dec. 2002.
  • Malik, H.; Fahiem, M.A., “Segmentation of Printed Urdu Scripts Using Structural Features Visualisation”, 2009. VIZ '09. Second International Conference in doi: 10.1109/VIZ.2009.12 Publication Year: 2009 , PP: 191 – 195
  • Inam Shamsher, Zaheer Ahmad, Jehanzeb Khan. Title: “OCR for Printed Urdu Script Using Feed Forward Neural Network”. Conference name: “Proceedings of world academy of science, engineering and technology”, International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:1, No:10, 2007 volume 23, August 2007, pp 508-513.
  • Malik, S.; Khan, S.A., "Urdu online handwriting recognition," Emerging Technologies, 2005. Proceedings of the IEEE Symposium on , vol., no., pp.27,31, 18-18 Sept. 2005
    doi: 10.1109/ICET.2005.1558849
  • Khalil Khan, Rehan Ullah, Nasir Ahmad Khan and Khwaja Naveed. Article: Urdu Character Recognition using Principal Component Analysis. International Journal of Computer Applications 60(11):1-4, December 2012
  • Brijesh Sojitra, Vishnukumar Dhakad, “Neural Network In Character Recognition Of Gujarati Script”Journal Of Information, Knowledge And Research In Computer Engineering, ISSN: 0975– 6760, Volume – 02, Issue – 02, Pp.269-272
  • Vasant, A.R.; Vasant, S.R.; Kulkarni, G.R., "Performance Evaluation of Different Image Sizes for Recognizing Offline Handwritten Gujarati Digits Using Neural Network Approach," Communication Systems and Network Technologies (CSNT), 2012 International Conference on , vol., no., pp.270,273, 11-13 May 2012
    doi: 10.1109/CSNT.2012.66
  • Vijaykumar, B.; Ramakrishnan, A.G., "Radial basis function and subspace approach for printed Kannada text recognition," Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on , vol.5, no., pp.V,321-4 vol.5, 17-21 May 2004
    doi: 10.1109/ICASSP.2004.1327112
  • Rajput, G.G.; Horakeri, R., "Shape descriptors based handwritten character recognition engine with application to Kannada characters," Computer and Communication Technology (ICCCT), 2011 2nd International Conference on , vol., no., pp.135,141, 15-17 Sept. 2011
    doi: 10.1109/ICCCT.2011.6075175
  • Vishwaas, M.; Arjun, M.M.; Dinesh, R., "Handwritten Kannada character recognition based on Kohonen Neural Network," Recent Advances in Computing and Software Systems (RACSS), 2012 International Conference on , vol., no., pp.91,97, 25-27 April 2012 doi: 10.1109/RACSS.2012.6212704
  • Primekumar, K.P.; Idiculla, S.M., "On-line Malayalam handwritten character recognition using HMM and SVM," Signal Processing Image Processing & Pattern Recognition (ICSIPR), 2013 International Conference on , vol., no., pp.322,326, 7-8 Feb. 2013

doi: 10.1109/ICSIPR.2013.6497991

  • Anil R, Arjun Pradeep, Midhun E M, Manjusha K, “Malayalam Character Recognition using Singular Value Decomposition”, International Journal of Computer Applications, ISSN:0975 – 8887, Volume 92 – No.12, April 2014, pp-6-11.
  • Chaudhuri, B.B.; Pal, U.; Mitra, M., "Automatic recognition of printed Oriya script," Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on , vol., no., pp.795,799, 2001
    doi: 10.1109/ICDAR.2001.953897
  • Pal, U.; Wakabayashi, T.; Kimura, F., "A System for Off-Line Oriya Handwritten Character Recognition Using Curvature Feature," Information Technology, (ICIT 2007). 10th International Conference on , vol., no., pp.227,229, 17-20 Dec. 2007
    doi: 10.1109/ICIT.2007.63
  • Sukhpreet Singh, Ashutosh Aggarwal, Renu Dhir, “Use of Gabor Filters for Recognition of Handwritten Gurmukhi Character”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 5, May 2012, ISSN: 2277 128X, pp.234-239.
  • G S Lehal and Chandan Singh, “A Complete OCR System For Gurmukhi Script” Proceedings SPR2002, Windsor, Canada, Lecture Notes in Computer Science, Vol. 2248, Springer- Verlag, Germany, 2002, pp. 344-352
  • Aydin, M.; Celik, E., "Assamese character recognition with Artificial Neural Networks," Signal Processing and Communications Applications Conference (SIU), 2013 21st , vol., no., pp.1,4, 24-26 April 2013, doi: 10.1109/SIU.2013.6531488
  • Medhi, K.; Kalita, S.K., "Recognition of assamese handwritten numerals using mathematical morphology," Advance Computing Conference (IACC), 2014 IEEE International , vol., no., pp.1076,1080, 21-22 Feb. 2014
    doi: 10.1109/IAdCC.2014.6779475
  • Dineshkumar and J. Suganthi, “Sanskrit Character Recognition System using Neural Network”, Indian Journal of Science and Technology, Vol 8(1), 65–69, January 2015, ISSN (Print) : 0974-6846,  ISSN (Online) : 0974-5645, pp.65-69
  • Chandan Jyoti Kumar, Sanjib Kumar Kalita, “Recognition of Handwritten Numerals of Manipuri Script”, International Journal of Computer Applications (0975 – 8887) Volume 84 – No.17, December 2013, pp.1-5
  • Romesh Laishram, Angom Umakanta Singh, N.Chandrakumar Singh, A.Suresh Singh, H.James, “Simulation and Modeling of Handwritten Meitei Mayek Digits using Neural Network Approach”, of the Intl. Conf. on Advances in Electronics, Electrical and Computer Science Engineering — EEC 2012,  ISBN: 978-981-07-2950-9 doi:10.3850/ 978-981-07-2950-9 769, pp.355-358
  • Thoudam Doren Singh, “Bidirectional Bengali Script and Meetei Mayek Transliteration of Web Based Manipuri News Corpus”, Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 181–190, COLING 2012, Mumbai, December 2012,  181-189
  • N. Hakro, A. Z. Talib, Z. Bhatti, G. N. Mojai, “A Study Of Sindhi Related And Arabic Script Adapted Languages Recognition”,  Sindh University Research Journal,  Vol. 46 (3) 323-334 (2014), pp.323-333
  • Bashir, R.; Quadri, S., "Identification of Kashmiri script in a bilingual document image," in Image Information Processing (ICIIP), 2013 IEEE Second International Conference on , vol., no., pp.575-579, 9-11 Dec. 2013, doi: 10.1109/ICIIP.2013.6707658
  • Santosh K.C., Cholwich Nattee, “A Comprehensive Survey On On-Line Handwriting Recognition Technology And Its Real Application To The Nepalese Natural Handwriting”, Kathmandu University Journal Of Science, Engineering And Technolgy Vol. 5, No. I, January, 2009, pp 31-55
  • Prajwal Rupakheti, Bal Krishna Bal, “Research Report on the Nepali OCR”, Madan Puraskar Pustakalaya, 2009
  • Fritz E. Froehlich, Allen Kent, The Froehlich/Kent Encyclopedia of Telecommunications: Volume 3, CRC Press, 31-Oct-1991
  • ChartsBin statistics collector team 2011, Number of Endangered Languages by Country, ChartsBin.com, August, 2015, <http://chartsbin.com/view/1339>.
  • Uday Narayan Singh, ‘Minor and Minority Languages in India’, in Report by G.N. Devy Sub-Group, Protecting Non-Scheduled Languages, 11th five year plan proposal, Ministry of Human Resource Development, 2006.
  • Indic writing systems. 2015. Encyclopædia Britannica Online. Retrieved 27 August, 2015, http://www.britannica.com/topic/Indic-writing-systems
  • UNESCO 2011, Number of endangered languages by country, 2011, United Nations Educational, Scientific and Cultural Organisation Institute for Statistics, Paris, France, 2011,

[http://www.unesco.org/culture/languages-atlas/index.php?hl=en&page=atlasmap].

  1. N. Besekar, R. J. Ramteke, “Study for Theoretical Analysis of Handwritten MODI Script – A Recognition Perspective”, International Journal of Computer Applications (0975 – 8887) Volume 64– No.3, February 2013, pp-45-49.
  2. N. Besekar, “Recognition Of Numerals Of Modi Script Using Morphological Approach”, Shodhsamiksha Aur Mulyankan, ISSN- 0974-2832 RNI-RAJBIL 2009/29954.Vol.III, Issue-27, pp-63-66
  3. N. Besekar, R. J. Ramteke, “A Chain Code Approach for Recognizing Modi Script Numerals”, Indian Journal of Applied Research, Vol-I, Issue-3, Dec 2011, ISSN-2249-555X, pp-222-225
  4. S. Ramteke, G S Katkar, “Recognition of Off-line Modi Script : A Structure Similarity Approach”, International Journal of ICT and Management, February 2013 Vol- I Issue –I, ISSN No. 2026-6839, pp-12-15
  5. Pandey, Anshuman. "Proposal to Encode the Modi Script in ISO/IEC 10646". Unicode Consortium. 2011, http://www.unicode.org/L2/L2011/11212r2-n4034-modi.pdf
  6. Unicode Standard 8.0, Copyright © 1991-2015 Unicode. < http://unicode.org/charts/PDF/U11600.pdf >
  7. Tejinder Singh Saini and Gurpreet Singh Lehal, “Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach”,  Advances in Natural Language Processing and Applications Research in Computing Science 33, 2008, pp. 151-162
  8. Kansham Angphun Maring, Dr. Renu Dhir, “Recognition Of Cheising Iyek/Eeyek-Manipuri Digits Using Support Vector Machines” IJCSIT, Vol. 1, Issue 2 (April 2014), e-ISSN: 1694-2329 | p-ISSN: 1694-2345,  pp-1-6.
  9. Anuradha Srinivas, Arun Agarwal, And C.Raghavendra Rao, “An Overview Of Ocr Research In Indian Scripts”, Ijcses International Journal Of Computer Sciences And Engineering Systems, Vol.2, No.2, April 2008, pp-141-153
  10. M.K.Sinha and H.N.Mahabala (1979), Machine recognition of Devnagari script, IEEE Trans. on Systems, Man and Cybernetics, SMC-9, 435-441.
  11. Mahesh Jangid, “Devanagari Isolated Character Recognition by using Statistical features”, International Journal on Computer Science and Engineering, ISSN : 0975-3397 Vol. 3 No. 6 June 2011, pp-2400-2407
  12. Ankush A.Mohod, Nilesh N.Kasat, “Optical Character Recognition of Printed Text in Devanagari Using Neuro - Fuzzy Integrated System”, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-3, Issue-7, December 2013
  13. Ladwani, V.M.; Malik, L., "Novel Approach to Segmentation of Handwritten Devnagari Word," in Emerging Trends in Engineering and Technology (ICETET), 2010 3rd International Conference on , vol., no., pp.219-224, 19-21 Nov. 2010
    doi: 10.1109/ICETET.2010.143
  14. Lehal, G.S.; Singh, C., "A Gurmukhi script recognition system," in Pattern Recognition, 2000. Proceedings. 15th International Conference on , vol.2, no., pp.557-560 vol.2, 2000
    doi: 10.1109/ICPR.2000.906135
  15. Jindal, M.K.; Sharma, R.K.; Lehal, G.S., "Structural Features for Recognizing Degraded Printed Gurmukhi Script," in Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on , vol., no., pp.668-673, 7-9 April 2008
    doi: 10.1109/ITNG.2008.223
  16. Bhattacharya, M. Shridhar, and S.K. Parui, “On Recognition of Handwritten Bangla Characters”, ICVGIP 2006, LNCS 4338, pp. 817–828
  17. Seethalakshmi R.†, Sreeranjani T.R.†, Balachandar T., “Optical Character Recognition for printed Tamil text using Unicode”, Journal of Zhejiang University SCIENCE, ISSN 1009-3095, 2005, pp. 297-1305.
  18. Bindu Philip and R. D. Sudhaker Samuel, “An Efficient OCR for Printed Malayalam Text using Novel Segmentation Algorithm and SVM Classifiers”, International Journal of Recent Trends in Engineering, Issue. 1, Vol. 1, May 2009, pp. 178-182
  19. R Sanjeev Kunte, R D Sudhaker Samuel, “A simple and efficient optical character recognition system for basic symbols in printed Kannada text”, Sadhana 32, Part 5, October 2007, pp. 521–533
  20. Apurva A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network,” Pattern Recognition, Volume 43, Issue 7, July 2010, ISSN 0031-3203, pp. 2582-2589,
  21. Sohail Abdul, Sattar Shams-ul, Haque Mahmood Khan Pathan, “A Finite State Model for Urdu Nastalique Optical Character Recognition”, IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.9, September 2009, pp. 116-122
  22. Khalil Khan, Muhammad Siddique , Muhammad Aamir & Rehanullah Khan, “An Efficient Method for Urdu Language Text Search in Image Based Urdu Text”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): pp. 1694-0814.
  23. Sujata S. Magare, Ratnadeep R. Deshmukh, “Offline Handwritten Sanskrit Character Recognition Using Hough Transform and Euclidean Distance”, International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 10 No. 2 Oct. 2014, pp. 295-302
  24. National Mission for Manuscripts, [http://namami.org]
  25. Technology Development for Indian Languages (TDIL), Department of Information Technology (DIT), Govt. India,   [http://www.tdil.mit.gov.in]
  26. Centre for Development of Advanced Computing, Multilingual Computing & Heritage Computing, [http://www.cdac.in/index.aspx?id=mlingual_heritage]
  27. People’s Linguistic Survey of India (PLSI), http://peopleslinguisticsurvey.org/
  28. Matthias Brenzinger, “Language Diversity Endangered”, ISBN, Walter de Gruyter GmbH & Co, 2007, 978-3-11-017054
  29. Pal, U.; Chaudhuri, B.B., "OCR in Bangla: an Indo-Bangladeshi language," in Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing, Proceedings of the 12th IAPR International. Conference on , vol.2, no., pp.269-273 vol.2, 9-13 Oct 1994
    doi: 10.1109/ICPR.1994.576917
  30. Dunn, C.E.; Wang, P.S.P., "Character segmentation techniques for handwritten text-a survey," in Pattern Recognition, 1992. Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on , vol., no., pp.577-580, 30 Aug-3 Sep 1992
    doi: 10.1109/ICPR.1992.201844
  31. Sharma, D.V.; Lehal, G.S., "An Iterative Algorithm for Segmentation of Isolated Handwritten Words in Gurmukhi Script," in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on , vol.2, no., pp.1022-1025, doi: 10.1109/ICPR.2006.258
  32. Chaudhuri, B.B. ; CVPR Unit, Indian Stat. Inst., Kolkata, India ; Bera, S., Handwritten Text Line Identification in Indian Scripts, Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference 2009, ISBN- 978-1-4244-4500-4, 10.1109/ICDAR.2009.69
  33. Sundaram, S.; Ramakrishan, A.G., "Lexicon-Free, Novel Segmentation of Online Handwritten Indic Words," in Document Analysis and Recognition (ICDAR), 2011 International Conference on , vol., no., pp.1175-1179, 18-21 Sept. 2011 doi: 10.1109/ICDAR.2011.237

Downloads

Published

2018-02-28

Issue

Section

Research Articles

How to Cite

[1]
Rajesh Kumar Maurya, " Review of Advances in Digital Recognition of Indic Scripts, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 1, pp.991-1009, January-February-2018.