Survey on Classification Approach for Text Categorization

Authors

  • Rupali Patil  Department of Computer Engineering Rajashri Shahu College of Engineering, Savitribai Phule Pune University, Pune, India
  • Ms. V. M. Barkade  Department of Computer Engineering Rajashri Shahu College of Engineering, Savitribai Phule Pune University, Pune, India

Keywords:

Text categorization, class-specific features, Feature selection, PDF projection and estimation, dimension reduction, J48, Term weighting.

Abstract

In the area of information retrieval, text categorization has recently become an active research topic. The goal of text categorization is to allot entrics from a set of prespecified categories to a document. Learning in a very high dimensional data space is a key challenge in a text categorization approach. Learning from such high dimensional features may prompt a high computational burden and may even hurt the classification performance of classifiers because of irrelevant and, redundant features. To improve the 'curse of dimensionality' issue and to speed up the learning procedure of classifiers, it is important to perform feature reduction to reduce the size of features. This paper introduces a Bayesian arrangement approach and J48 classifier for automatic text categorization utilizing class-specific features. For text categorization, has the proposed strategy chosen a specific feature subset for every class. The detectable significance of this methodology is that most feature selection criteria, for example, Information Gain (IG) and Maximum Discrimination (MD), can be effectively joined into this methodology. The J48 classifier saves the time and memory. The proposed system also uses Term weighting concept for preprocessing. These methods increase the accuracy of classification and feature selection process and improve the system performance.

References

  1. Bo Tang, Haibo He, Paul M. Baggenstoss, and Steven Kay, "A Bayesian Classification Approach Using Class-Specific Features for Text Categorization", 1041-4347 (c) 2015 IEEE, Transactions on Knowledge and Data
  2. Paul M. Baggenstoss, "The pdf projection theorem and the class-specific method", IEEE Transactions on Signal Processing, vol. 51, no. 3, pp.672-685, 2003.
  3. W. Lam, M. Ruiz, and P. Srinivasan, "Automatic text categorization and its application to text retrieval", IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 6, pp. 865-879, 1999.
  4. H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering", IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491-502, 2005.
  5. A. Kulkarni, V. Tokekar and P. Kulkarni, "Term weighting using contextual information for categorization of unstructured text documents," 2015 Annual IEEE India Conference (INDICON), New Delhi, 2015, pp. 1-4.
  6. J. J. Patil and N. Bogiri, "Automatic text categorization: Marathi documents", 2015 International Conference on Energy Systems and Applications,Pune, 2015, pp. 689-694.
  7. F. S. Al-Anzi and D. AbuZeina, "Stemming impact on Arabic text categorization performance: A survey", 2015 5th International Conference on Information Communication Technology and Accessibility (ICTA),Marrakech, 2015, pp. 1-7.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Rupali Patil, Ms. V. M. Barkade, " Survey on Classification Approach for Text Categorization, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 4, pp.1006-1010, March-April-2018.