Relevance Feature Discovery for Text Mining Using Feature Clustering

Authors(4) :-Mohan I, Ajith Kumar C, Ajith Kumar B, Bhuvanesh S

It is difficult to obtain the quality of relevance feature discovery in text mining because of large data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. However, pattern-based approaches yields better result than term-based approaches. So, we decided to implement a pattern based approach in our paper. This paper explains about the pattern-based approach in large text patterns. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). This paper uses Clustering technique to discover the relevant and irrelevant documents. It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

Authors and Affiliations

Mohan I
Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India
Ajith Kumar C
Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India
Ajith Kumar B
Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India
Bhuvanesh S
Department of Information Technology, Prathyusha Engineering College, Anna University, Tiruvallur, Tamil Nadu, India

Text Features Classification, Fclustering, Pattern-Based Approach, Term-Based Approach, Feature Discovery.

  1. M. Aghdam, N. Ghasem-Aghaee, and M. Basiri, "Text feature selection using ant colony optimization," in Expert Syst. Appl.,vol. 36, pp. 6843–6853, 2009.
  2. Algarni and Y. Li, "Mining specific features for acquiring user information needs," in Proc. Pacific Asia Knowledge Discovery Data Mining, 2013, pp. 532–543.
  3. Algarni, Y. Li, and Y. Xu, "Selected new training documents toupdate user profile," in Proc. Int. Conf. Inf. Knowl.  Manage., 2010, pp. 799–808.
  4. N. Azam and J. Yao, "Comparison of term frequency and documentfrequency based feature selection metrics in text categorization,"Expert Syst. Appl., vol. 39, no. 5, pp. 4760–4768,2012.
  5. R. Bekkerman and M. Gavish, "High-precision phrase-based document classification on a modern scale," in Proc. 11th ACM SIGKDD Knowl. Discovery Data Mining, 2011, pp. 231–239.
  6. Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artif. Intell., vol. 97, nos. 1/2, pp. 245–271, 1997.
  7. For a business perspective on data mining and analytics, without technical detail, see Competing on Analytics: The New Science of Winning by Thomas H. Davenport and Jeanne G. Harris. 
  8. Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze.

Publication Details

Published in : Volume 2 | Issue 2 | March-April 2017
Date of Publication : 2017-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 661-665
Manuscript Number : CSEIT1722197
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Mohan I, Ajith Kumar C, Ajith Kumar B, Bhuvanesh S, "Relevance Feature Discovery for Text Mining Using Feature Clustering", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 2, pp.661-665, March-April-2017.
Journal URL : http://ijsrcseit.com/CSEIT1722197

Article Preview

Follow Us

Contact Us