A Study on Recent Issues in Text Pre-processing and Classification Techniques

Authors

  • Dr. Ramalingam Sugumar  Professor & Deputy Director, Christhu Raj College, Tamil Nadu, India

Keywords:

Text Data Mining, Text Mining, Stemming algorithms, classification algorithms Truncating, Statistical.

Abstract

Data Mining is a significant research area in the field of computer science. Data mining techniques are applied to textual data sets is known as Textual Data Mining (TDM). The TDM consists of two stages Pre-processing and post-processing. In TDM pre-processing faces several issues in various stages such as Tokenization, Stop Word Removal and Stemming. The stemming is one of the pre-processing technique in text mining. It is mainly used to removing inflectional and derivational endings in order to reduce word forms to a common stem. The stemming involves text processing task includes information retrieval, text mining, and natural language processing. In this study, discuss recent issues in text pre-processing classification algorithms in text mining. The study is to show the merits and demerits of text mining techniques.

References

  1. Ning Zhong,Yuefeng Li and Sheng-Tang Wu, Effective pattern Discovery for Text Mining.,IEEE Transactions on knowledge and data engineering .,vol 24,jan 2012.
  2. M.S.B. PhridviRaj, C.V. GuruRao., Data mining – past, present and future – a typical survey on data streams, Elsevier 2013.
  3. S.P. Ruba Rani, B.Ramesh and Dr.J.G.R.Sathiaseelan, “An Increasing Efficiency of Pre-processing using APOST Stemmer Algorithm for Information Retrieval”, Journal of Emerging Technologies and Innovative Research, Volume 2, Issue 7, pp.3219-3223, July 2015.
  4. ChristieM.Fuller,David P.Biros,Dursun Delen., An investigation of data and text mining methods for real world deception detection,Elsevier 2011.
  5. R. Sagayam, S.Srinivasan, S. Roshni., A Survey of Text Mining: Retrieval, Extraction and Indexing Techniques., International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 5,September 2012.
  6. Manning, C. and Schutze, H. Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA, 1999.
  7. Scaling Information Extraction to Large Document Collections, Eugene Agichtein, Microsoft Research, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
  8. Anjali Ganesh Jivani, A Comparative Study of Stemming Algorithms, IJCTA | NOV-DEC 2011.
  9. http://orion.lcg.ufrj.br/Dr.Dobbs/books/book5/chap08.htm ,CHAPTER 8: STEMMING ALGORITHMS , W. B. Frakes, Software Engineering Guild, Sterling, VA 22170.
  10. Willett, P. (2006) The Porter stemming algorithm: then and now. Program:Electronic library and information systems, 40 (3). pp. 219-223.
  11. Paice, C., Husk, G., Another Stemmer, ACM SIGIR Forum 24(3): 566, 1990.
  12. S.Santhana Megala, Dr.A.Kavitha Dr. A.Marimuthu, Improvised Stemming Algorithm – TWIG, 2013, IJARCSSE.
  13. J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computer Linguistic., vol.11, no.1/2, pp. 22-31, 1968.
  14. K.K. Agbele, A.O. Adesina, N.A. Azeez , & A.P. Abidoye, ContextAware Stemming Algorithm for Semantically Related Root Words., AfricanJournal of Computing & ICT June, 2012 .,IEEE.
  15. M. Porter (1980). An Algorithm for Suffix Stripping.
  16. Program, vol. 14, no. 3, pp: 130 – 137.
  17. Gobinda Kole, Pabitra Mitra andKalyankumar Datta. “YASS: Yet another suffix stripper”. ACM Transactions on Information Systems.Volume 25, Issue 4. 2007, Article No. 18.
  18. Yang Shao and Ross S. Lunetta, “Comparison of support vector machine, neural network, CART algorithms for the land-cover classification using limited training data points”, ISPRS Journal of Photogrammetry and Remote sensing, Elsevier, Volume 70, pp.78-87, 2012.

Downloads

Published

2018-02-28

Issue

Section

Research Articles

How to Cite

[1]
Dr. Ramalingam Sugumar, " A Study on Recent Issues in Text Pre-processing and Classification Techniques , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 1, pp.1813-1817, January-February-2018.