Relevance Feature Discovery for Text Mining Using Feature Clustering

Authors

  • Mohan I  Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India
  • Ajith Kumar C  Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India
  • Ajith Kumar B  Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India
  • Bhuvanesh S  Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India

Keywords:

Text Features Classification, Fclustering, Pattern-Based Approach, Term-Based Approach, Feature Discovery.

Abstract

It is difficult to obtain the quality of relevance feature discovery in text mining because of large data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. However, pattern-based approaches yields better result than term-based approaches. So, we decided to implement a pattern based approach in our paper. This paper explains about the pattern-based approach in large text patterns. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). This paper uses Clustering technique to discover the relevant and irrelevant documents. It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

References

  1. M. Aghdam, N. Ghasem-Aghaee, and M. Basiri, "Text feature selection using ant colony optimization," in Expert Syst. Appl.,vol. 36, pp. 6843–6853, 2009.
  2. Algarni and Y. Li, "Mining specific features for acquiring user information needs," in Proc. Pacific Asia Knowledge Discovery Data Mining, 2013, pp. 532–543.
  3. Algarni, Y. Li, and Y. Xu, "Selected new training documents toupdate user profile," in Proc. Int. Conf. Inf. Knowl.  Manage., 2010, pp. 799–808.
  4. N. Azam and J. Yao, "Comparison of term frequency and documentfrequency based feature selection metrics in text categorization,"Expert Syst. Appl., vol. 39, no. 5, pp. 4760–4768,2012.
  5. R. Bekkerman and M. Gavish, "High-precision phrase-based document classification on a modern scale," in Proc. 11th ACM SIGKDD Knowl. Discovery Data Mining, 2011, pp. 231–239.
  6. Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artif. Intell., vol. 97, nos. 1/2, pp. 245–271, 1997.
  7. For a business perspective on data mining and analytics, without technical detail, see Competing on Analytics: The New Science of Winning by Thomas H. Davenport and Jeanne G. Harris. 
  8. Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze.

Downloads

Published

2017-04-30

Issue

Section

Research Articles

How to Cite

[1]
Mohan I, Ajith Kumar C, Ajith Kumar B, Bhuvanesh S, " Relevance Feature Discovery for Text Mining Using Feature Clustering, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 2, pp.661-665, March-April-2017.