Survey on Feature Selection for Text Categorization

Authors

  • Sonali Suskar  Department of Computer Engineering, SIT College of Engineering, Lonavala, Maharashtra, India
  • Dr. S. D. Babar  Department of Computer Engineering, SIT College of Engineering, Lonavala, Maharashtra, India

Keywords:

Classification, text categorization, feature selection, training data.

Abstract

In this massive amount of data, data is too vast so that text categorization is important issue. With the help of previously organize set of documents and classes we can automatically classify data. The filter approach is predominantly used in text categorization because of its simplicity and efficiency. However, the filter approach evaluates the goodness of a feature by only exploiting the intrinsic characteristics of the training data without considering the learning algorithm for discrimination, which may lead to an undesired classification performance. Given a specific learning algorithm, it is hard to determine which filter feature selection approach is the best for discrimination. This survey mainly focuses on the techniques used for feature selection method used for text categorization. This survey also presents the comparative analysis of such recent techniques along with their limitations.

References

  1. B. Tang, S. Kay and H. He, "Toward Optimal Feature Selection in Naïve Bayes for Text Categorization,” in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 9, pp. 2508-2521, Sept. 1 2016.
  2. B. Tang, H. He, P. M Baggenstoss, and S. Kay, "A Bayesian Classification approach using class-specific features for text categorization,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 6, pp. 1602-1606, Jun. 2016.
  3. B. Tang, S. Kay, H. He, and P. M. Baggenstoss, "EEF: Exponentially embedded families with class-specific features for classification,” IEEE Signal Process. Lett., in press, 2016.
  4. B. Tang, H. He, Q. Ding, and S. Kay, "A parametric classification rule based on the exponentially embedded family,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 2, pp. 367-377, 2015.
  5. Yang, Jieming, et al. "A term weighting scheme based on the measure of relevance and distinction for text categorization.” Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2015 16th IEEE/ACIS International Conference on. IEEE, 2015.
  6. Domeniconi, Giacomo, et al. "A study on term weighting for text categorization: a novel supervised variant of TF. IDF.” Proceedings of the 4th international conference on data management technologies and applications (DATA). Candidate to the best conference paper award. 2015
  7. Zong, Wei, et al. "A discriminative and semantic feature selection method for text categorization.” International Journal of Production Economics 165 (2015): 215-222.
  8. Jin, Chuanxin, et al. "Chi-square statistics feature selection based on term frequency and distribution for text categorization.” IETE Journal of Research 61.4 (2015): 351-362.
  9. Chen, Yifei, Bingqing Han, and Ping Hou. "New feature selection methods based on context similarity for text categorization.” Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on. IEEE, 2014.
  10. F. Yi and O K. Baykan, "A new feature selection method for text categorization based on information gain and particle swarm optimization,” 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems, Shenzhen, 2014, pp. 523-529.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Sonali Suskar, Dr. S. D. Babar, " Survey on Feature Selection for Text Categorization , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 4, pp.261-266, March-April-2018.