Efficient Feature Selection and Classification Technique For Large Data

Authors

  • P. Arumugam   Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India
  • P. Jose  Research scholar, Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India

Keywords:

Feature Selection, Classification, PSO, GWO , SVM

Abstract

Grey wolf optimizer (GWO) is a Heuristic evolutionary algorithm recently proposed, it is inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. In order to reduce the data set without affecting the classifier accuracy. The feature selection plays a vital role in large datasets and which increases the efficiency of classification to choose the important features for high dimensional classification, when those features are irrelevant or correlated. Therefore, feature selection is considered to use in pre-processing before applying classifier to a data set. Thus, this good choice of feature selection leads to the high classification accuracy and minimize computational cost. Though different kinds of feature selection methods are investigate for selecting and fitting features, the best algorithm should be preferred to maximize the accuracy of the classification. This paper proposes intelligent optimization methods, which simultaneously determines the parameter values while discovering a subset of features to increase SVM classification accuracy. In this paper, initial subset selection is based on the latest bio inspired Grey wolf optimization technique proposed. Which take off the hunting process of gray wolve. This optimizer search the feature space for optimal feature solution in diverse directions in order to minimize the option of trapped in local minimum and enhance the convergence speed. The Novel approach aimed to speed up the training time and optimize the SVM classifier accuracy automatically. The proposed model used to select minimum number of features and providing high classification accuracy of large datasets.

References

  1. Han-Pang Huang Y H L, Fuzzy Support Vector Machines for Pattern Recognition and Data mining, International Journal of Fuzzy Systems, pp. 826-835, (2002).
  2. Joachims T (1998) Making large-scale support vector machine learning practical. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA, 169-184.
  3. Zhang C H, Tian Y J, Deng N Y, The new interpretation of Support Vector Machines on Statistical Learning Theory, Science China  Mathematics pp.151-164, (2010).
  4. Deng N Y , Tian Y J, Zhang C H , Support Vector Machines, Optimization Based Theory, Algorithms and Extensions.CRC Press, (2012).
  5. Cristianini N, Shawe-Taylor J, An Introduction to Support Vector Machines and other Kernel based Learning Methods,1st edition. Cambridge University Press,(2000).
  6. Chang F, Guo C Y, Lin X R, Lu C J, Tree decomposition for large scale SVM problems, Journal of Machine Learning,pp.2935-2972,(2010).
  7. Guyon I , Weston J, Barnhill  S and Vapnik V, Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning ,pp. 389-422, (2002)
  8. Piramuthu S, Input Data for Decision Trees, Expert System Application,34(2),pp.1220-1226,(2008)
  9. Kennedy, J. and Eberhart, R. C. Particle swarm optimization. Proc. IEEE int'l conf. on neural networks Vol. IV, pp. 1942-1948. IEEE service center, Piscataway, NJ, 1995.
  10. Eberhart, R. C. and Shi, Y. Particle swarm optimization: developments, applications and resources. Proc. congress on evolutionary computation 2001
  11. Wang R, He Y L,Chow C Y, Ou F F, Zhang J, Learning ELM tree from big data based on uncertainty reduction, Fuzzy Sets System (2014)
  12. Chih-Chung Chang and Chih-Jen Lin, LIBSVM A library for support vector machines. ACM Transactions on Intelligent Systems and Technology,pp.1–27,(2011). Software available at http://www.csie.ntu. edu.tw/~cjlin/libsvm
  13. Mukherjee S, Tamayo P,  Mesirov J P,  Slonim D,  Verri A, and  Poggio T, Support Vector Machine Classification of microarray data. Technical Report 182, (1999).
  14. Arun Kumar M, Gopal M, A hybrid SVM based decision tree, Pattern Recognition,43(12),pp.3977-3987,(2010).
  15. Cervantes J, Lopez A, Garcia F, Trueba A, A fast SVM training Algorithm based on a decision tree data filter,in Advances in Artificial Intelligence ,vol.7094 of Lecture Notes in Computer Science,Springer,pp.187-197,(2011).
  16. Furey S, Nigel Duffy, Nello Cristianini, David Bednarski, Michel Schummer, and David Haussler, Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Terrence Bioinformatics. pp.906-914( 2000).
  17. Platt J,Fast training of support vector machines using Sequential Minimal Optimization, in Advance kernel methods Support Vector Machine.pp.185-208.(1998)
  18. Chang C. C., and Lin C. J.  Training support vector classifiers, Theory and algorithms. Neural Computation vol.13,pp. 214-219,(.2001)
  19. Khan J, et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Med. vol .7,pp.673 (2001) .
  20. Furey T S, et al., Support Vector Machine Classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, vol .16,pp.906, (2000) .
  21. Pochet N,  De Smet F,  Suykens J A,  De Moor B L, Systematic benchmarking of microarray data classification assessing the role of  nonlinearity and dimensionality reduction, Bioinformatics, vol.20,pp.3185 (2004).
  22. Xing E P,  Jordan M I, Karp R M, Feature selection for high-dimensional genomic microarray data, in Proceedings of the 18th International Conference on Machine Learning, (2001)
  23. Venkatesh and Thangaraj, Investigation of Micro Array Gene Expression Using Linear Vector Quantization for Cancer", International Journal on Computer Science and Engineering, Vol. 02, No. 06, pp. 2114-2116, (2010).
  24. Ye J, Li T, Xiong T, and Janardan R, Using uncorrelated discriminant analysis for tissue classification with gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181–190, (2004).
  25. Peng Y, Li W, and Liu Y, A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification, Cancer Informatics, vol. 2, pp. 301–311, (2007).
  26. Bharathi A and  Natarajan A, Cancer classification of bioinformatics data using ANNOVA, International Journal of Computer Theory and Engineering, vol. 2, no. 3, pp. 369–373, (2010).
  27. Peng Y, A novel ensemble machine learning for robust microarray data classification, Computers in Biology and Medicine, vol. 36, no. 6, pp. 553–573, (2006).
  28. Lee C and  Leu Y, A novel hybrid feature selection method for microarray data analysis, Applied Soft Computing Journal, vol. 11, no. 1, pp. 208–213, (2011).
  29. Arun Kumar M ,Goal M ,A hybrid SVM based decision tree pattern Recognition.pp.3977-3987(2010).

Downloads

Published

2017-04-30

Issue

Section

Research Articles

How to Cite

[1]
P. Arumugam , P. Jose, " Efficient Feature Selection and Classification Technique For Large Data, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 2, pp.1041-1047, March-April-2017.