Relevance Feature Discovery for Text Mining Using Feature Clustering
Keywords:
Text Features Classification, Fclustering, Pattern-Based Approach, Term-Based Approach, Feature Discovery.Abstract
It is difficult to obtain the quality of relevance feature discovery in text mining because of large data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. However, pattern-based approaches yields better result than term-based approaches. So, we decided to implement a pattern based approach in our paper. This paper explains about the pattern-based approach in large text patterns. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). This paper uses Clustering technique to discover the relevant and irrelevant documents. It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.
References
- M. Aghdam, N. Ghasem-Aghaee, and M. Basiri, "Text feature selection using ant colony optimization," in Expert Syst. Appl.,vol. 36, pp. 6843–6853, 2009.
- Algarni and Y. Li, "Mining specific features for acquiring user information needs," in Proc. Pacific Asia Knowledge Discovery Data Mining, 2013, pp. 532–543.
- Algarni, Y. Li, and Y. Xu, "Selected new training documents toupdate user profile," in Proc. Int. Conf. Inf. Knowl. Manage., 2010, pp. 799–808.
- N. Azam and J. Yao, "Comparison of term frequency and documentfrequency based feature selection metrics in text categorization,"Expert Syst. Appl., vol. 39, no. 5, pp. 4760–4768,2012.
- R. Bekkerman and M. Gavish, "High-precision phrase-based document classification on a modern scale," in Proc. 11th ACM SIGKDD Knowl. Discovery Data Mining, 2011, pp. 231–239.
- Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artif. Intell., vol. 97, nos. 1/2, pp. 245–271, 1997.
- For a business perspective on data mining and analytics, without technical detail, see Competing on Analytics: The New Science of Winning by Thomas H. Davenport and Jeanne G. Harris.
- Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.