Parts of Speech Tagger for Pali Language

Authors

  • Yashodhara Haribhakta  Departmentof Computer Engineering and IT, College of Engineering, Pune, Maharashtra, India
  • Laxmi Nadageri  Departmentof Computer Engineering and IT, College of Engineering, Pune, Maharashtra, India

Keywords:

Parts of Speech tagging, tagger, Rule based tagger, Pali language.

Abstract

Parts of Speech tagging is the process of labelling the words in the text with their appropriate labels. The labels assigned are noun, verb, adjective, adverb, pronoun... etc. For performing natural language processing, Parts of Speech tagging is an essential requirement. It is very simple statistical model for many Natural Language Processing applications. In this paper, we propose a parts of speech tagger for Pali language. Pali though considered as extinct, has very rich literature comprising works on Logic, History, Medicine, Pharmacology etc. It is an Indo-Aryan language. The general approach used for development of Pali tagger is a Rule based approach. It also presents the tagset used for Pali language. The paper shows the performance of proposed Rule based tagger for a dataset up to 300 sentences / 1000 words. The learning algorithms Support Vector Machine and Decision Tree have been used for measuring the performance on Pali tagged corpus.

References

  1. Eric Brill. A simple rule-based part of speech tagger. In Proceedings of the workshop on Speech and Natural Language, pages 112-116. Association for Computational Linguistics, 1992.
  2. Jinho D Choi and Martha Palmer. Fast and robust part-of-speech tagging using dynamic model selection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 363-367. Association for Computational Linguistics, 2012.
  3. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. Journal of machine learning research, 9(Aug):1871-1874, 2008.
  4. K Gebeyehu. The application of decision tree for part of speech (pos) tagging for amharic. PhD thesis, Master Thesis, 2009.
  5. Mandeep Singh Gill, Gurpreet Singh Lehal, and Shiv Sharma Joshi. Part of speech tagging for grammar checking of punjabi. The Linguistic Journal, 4(1):6-21, 2009.
  6. Brian R Hirshman. Training set properties and decision-tree taggers: A closer look, 2009.
  7. Tina R Patil and SS Sherekar. Performance analysis of naive bayes and j48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2):256-261, 2013.
  8. J Ross Quinlan. C4. 5: Programming for machine learning. Morgan Kauffmann, 38, 1993.
  9. RJ RamaSree and P Kusuma Kumari. Combining pos taggers for improved accuracy to create telugu annotated texts for information retrieval. Dept. of Telugu Studies, Tirupathi, India, 2007.
  10. Smriti Singh, Kuhoo Gupta, Manish Shrivastava, and Pushpak Bhattacharyya. Morphological richness offsets resource demand-experiences in constructing a pos tagger for hindi. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 779- 786. Association for Computational Linguistics, 2006.
  11. Thamar Solorio and Yang Liu. Part-of-speech tagging for english-spanish code-switched text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1051- 1060. Association for Computational Linguistics, 2008.
  12. Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016

Downloads

Published

2017-08-31

Issue

Section

Research Articles

How to Cite

[1]
Yashodhara Haribhakta, Laxmi Nadageri, " Parts of Speech Tagger for Pali Language, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.845-853, July-August-2017.