A Comparative Study on Data Mining Algorithms for Classification & Regression

Authors

  • D. Kavitha  CSE, Anna University/Dhaanish Ahmed College of Engineering, Chennai, TamilNadu, India
  • K. Sivasankari  CSE, Anna University/Dhaanish Ahmed College of Engineering, Chennai, TamilNadu, India
  • M. Pavethra  CSE, Anna University/Dhaanish Ahmed College of Engineering, Chennai, TamilNadu, India

Keywords:

Data Mining, Classification, Regression, SVM, KNN.

Abstract

Today, most of the organizations are actively collecting and storing data in large databases. The increasing demandforretrieval and analysis is answered by an efficient method called as “Data Mining” (DM). It is the process of extracting hidden information from large database/data warehouse. For the retrieval and analysis, DM uses different types of algorithms.Based on its applications,data mining algorithms are classified into five types such as, classification, regression, segmentation, association, sequence analysis.In this paper we present a comparative study among classification and regression algorithms. This paper providesa complete knowledge about the explained algorithms and a comparison between the algorithms presented in this section improves the value of this study.

References

  1. Chen, Ming-Syan, Jiawei Han, and Philip S. Yu. "Data mining: an overview from a database perspective." IEEE Transactions on Knowledge and data Engineering vol.8, no.6, pp.866-883, 1996.
  2. Fayyad, Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. "The KDD process for extracting useful knowledge from volumes of data." Communications of the ACM vol.39, no.11, pp.27-34, 1996.
  3. Chauhan, Divya, and VarunJaiswal. "An efficient data mining classification approach for detecting lung cancer disease." Communication and Electronics Systems (ICCES), International Conference on.IEEE, 2016.
  4. Lakshmi, B. N., and G. H. Raghunandhan. "A conceptual overview of data mining." Innovations in Emerging Technology (NCOIET), 2011 National Conference on.IEEE, 2011.
  5. Keim, Daniel A. "Information visualization and visual data mining." IEEE transactions on Visualization and Computer Graphics vol.8, no.1, pp.1-8, 2002.
  6. Tsui, Kwok-Leung, et al. "Data mining methods and applications." Springer handbook of engineering statistics.Springer London, pp.651-669, 2006.
  7. Jackson, Joyce. "Data mining; a conceptual overview." Communications of the Association for Information Systems, vol.8, no.1, pp.19, 2002.
  8. Chen, Sherry Y., and Xiaohui Liu. "The contribution of data mining to information science." Journal of Information Science, vol.30, no.6, pp.550-558, 2004.
  9. Zaki, Mohammed J. "Parallel and distributed data mining: An introduction." Lecture Notes in Computer Science (2000): 1-23.
  10. Sousa, Tiago, Arlindo Silva, and Ana Neves. "Particle swarm based data mining algorithms for classification tasks." Parallel Computing 30.5 (2004): 767-783.
  11. Song, Yunsheng, et al. "An efficient instance selection algorithm for k nearest neighbor regression." Neurocomputing, vol.251, pp.26-34, 2017.
  12. Zhang, Min-Ling, and Zhi-Hua Zhou. "A k-nearest neighbor based algorithm for multi-label classification." Granular Computing, 2005 IEEE International Conference on.Vol. 2.IEEE, 2005.
  13. Cunningham, Padraig, and Sarah Jane Delany. "k-Nearestneighbour classifiers." Multiple Classifier Systems vol.34, pp.1-17, 2007.
  14. Alkhatib, Khalid, et al. "Stock price prediction using k-nearest neighbor (knn) algorithm." International Journal of Business, Humanities and Technology vol.3, no.3, pp. 32-44, 2013.
  15. Adeniyi, D. A., Z. Wei, and Y. Yongquan. "Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method." Applied Computing and Informatics vol.12, no.1, pp. 90-108, 2016.
  16. Imandoust, SadeghBafandeh, and Mohammad Bolandraftar. "Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background." International Journal of Engineering Research and Applications vol.3, no.5, pp.605-610, 2013.
  17. Choubey, Dilip Kumar, et al. "Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection." Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016). 2017.
  18. Rennie, Jason D., et al. "Tackling the poor assumptions of naive bayes text classifiers." Proceedings of the 20th International Conference on Machine Learning (ICML-03). 2003.
  19. McCallum, Andrew, and Kamal Nigam. "A comparison of event models for naive bayes text classification." AAAI-98 workshop on learning for text categorization.Vol. 752. 1998.
  20. Arar, ÖmerFaruk, and KürşatAyan. "A Feature Dependent Naive Bayes Approach and Its Application to the Software Defect Prediction Problem." Applied Soft Computing 2017.
  21. Yang, ChuanChoong, Chit Siang Soh, and VooiVoon Yap. "A non-intrusive appliance load monitoring for efficient energy consumption based on Naive Bayes classifier." Sustainable Computing: Informatics and Systems vol.14, pp.34-42, 2017.
  22. D'Agostini, Giulio. "A multidimensional unfolding method based on Bayes' theorem." Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment vol.362, no.2-3, pp.487-498, 1995.
  23. Leung, K. Ming. "Naive bayesian classifier." Polytechnic University Department of Computer Science/Finance and Risk Engineering 2007.
  24. Yu, Zhun, et al. "A decision tree method for building energy demand modeling." Energy and Buildings vol.42, no.10, pp.1637-1646, 2010.
  25. Jin, Chen, Luo De-Lin, and Mu Fen-Xiang. "An improved ID3 decision tree algorithm." Computer Science & Education, 2009.ICCSE'09.4th International Conference on. IEEE, 2009.
  26. Liu, Wei, et al. "A robust decision tree algorithm for imbalanced data sets." Proceedings of the 2010 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics, 2010.
  27. Pal, Mahesh, and Paul M. Mather. "An assessment of the effectiveness of decision tree methods for land cover classification." Remote sensing of environment vol.86, no.4, pp. 554-565, 2003.
  28. Meng, Qi, et al. "A communication-efficient parallel algorithm for decision tree." Advances in Neural Information Processing Systems. 2016
  29. Li, Ye, et al. "Privacy-Preserving ID3 Data Mining over Encrypted Data in Outsourced Environments with Multiple Keys." Computational Science and Engineering (CSE) and Embedded and Ubiquitous Computing (EUC), 2017 IEEE International Conference on.Vol. 1.IEEE, 2017.
  30. Devasenapathy, K., and S. Duraisamy. "Evaluating the Performance of Teaching Assistant Using Decision Tree ID3 Algorithm." Entropy vol.151, no.49, pp.0-325, 2017.
  31. Yang, Shuo, Jing-ZhiGuo, and Jun-Wei Jin. "An improved Id3 algorithm for medical data classification." Computers & Electrical Engineering 2017.
  32. [32Lindell, Yehuda, and Benny Pinkas. "Privacy preserving data mining." Advances in Cryptology—CRYPTO 2000. Springer Berlin/Heidelberg, 2000.
  33. [33Singh, Yashpal, and Alok Singh Chauhan. "NEURAL NETWORKS IN DATA MINING." Journal of Theoretical & Applied Information Technology vol.5, no.1, 2009.
  34. [34Saxena, Abhinav, and Ashraf Saad. "Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems." Applied Soft Computing vol.7, no.1, pp.441-454, 2007.
  35. [35Noble, William S. "What is a support vector machine?." Nature biotechnology vol.24, no.12, pp.1565-1567, 2006.
  36. [36Guyon, Isabelle, et al. "Gene selection for cancer classification using support vector machines." Machine learning vol.46, no.1,pp.389-422, 2002.
  37. [37Pal, Mahesh, and P. M. Mather. "Support vector machines for classification in remote sensing." International Journal of Remote Sensing vol.26, no.5, pp.1007-1011, 2005.
  38. [38Cao, Li-Juan, and Francis Eng Hock Tay. "Support vector machine with adaptive parameters in financial time series forecasting." IEEE Transactions on neural networks vol.14, no.6,pp.1506-1518, 2003.
  39. [39Zeng, Zhi-Qiang, et al. "Fast training Support Vector Machines using parallel sequential minimal optimization." Intelligent System and Knowledge Engineering, 2008.ISKE 2008.3rd International Conference on. Vol. 1.IEEE, 2008.
  40. [40Widodo, Achmad, and Bo-Suk Yang. "Support vector machine in machine condition monitoring and fault diagnosis." Mechanical systems and signal processing, vol.21, no.6, pp.2560-2574, 2007.
  41. [41Chen, Ting, AnandRangarajan, and Baba C. Vemuri. "Caviar: Classification via aggregated regression and its application in classifying oasis brain database." Biomedical Imaging: From Nano to Macro, 2010 IEEE International Symposium on.IEEE, 2010.
  42. [42Zou, Kelly H., Kemal Tuncali, and Stuart G. Silverman. "Correlation and simple linear regression." Radiology vol.227, no.3,pp.617-628, 2003.
  43. [43Naseem, Imran, Roberto Togneri, and Mohammed Bennamoun. "Linear regression for face recognition." IEEE transactions on pattern analysis and machine intelligencevol.32, no.11,pp.2106-2112, 2010.
  44. [44Aalen, Odd O. "A linear regression model for the analysis of life times." Statistics in medicine vol.8, no.8,pp.907-925, 1989.
  45. [45Chang, Le, Steven Roberts, and Alan Welsh. "Robust Lasso Regression Using Tukey'sBiweight Criterion." Technometricsjust-accepted, 2017.
  46. [46Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society. Series B (Methodological),pp.267-288, 1996.
  47. [47Lu, Yiming, et al. "A Lasso regression model for the construction of microRNA-target regulatory networks." Bioinformatics vol.27, no.17,pp.2406-2413, 2011.
  48. [48Hosmer, David W., et al. "A comparison of goodness-of-fit tests for the logistic regression model." Statistics in medicinevol.16, no.9, pp.965-980, 1997.
  49. [49Dreiseitl, Stephan, and LucilaOhno-Machado. "Logistic regression and artificial neural network classification models: a methodology review." Journal of biomedical informatics vol.35, no.5, pp.352-359, 2002.
  50. [50Hilbe, Joseph M. "Logistic regression." International Encyclopedia of Statistical Science. Springer Berlin Heidelberg, pp.755-758,  2011.
  51. [51Cepeda, M. Soledad, et al. "Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders." American journal of epidemiology vol.158, no.3, pp.280-287, 2003.
  52. [52Goh, Gyuhyeong, Dipak K. Dey, and Kun Chen. "Bayesian sparse reduced rank multivariate regression." Journal of Multivariate Analysis vol.157,pp.14-28, 2017.
  53. [53Kharratzadeh, Milad, and Mark Coates. "Semi-parametric order-based generalized multivariate regression." Journal of Multivariate Analysis vol.156,pp.89-102, 2017.
  54. [54Jönsson, Carl Axel, and Emil Tarukoski. "How does an appointed ceo influence the stock price?: A Multiple Regression Approach." 2017.
  55. [55Singh, Rajesh, et al. "Prediction of geomechanical parameters using soft computing and multiple regression approach." Measurement vol.99, pp.108-119, 2017.

Downloads

Published

2018-02-28

Issue

Section

Research Articles

How to Cite

[1]
D. Kavitha, K. Sivasankari, M. Pavethra, " A Comparative Study on Data Mining Algorithms for Classification & Regression, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 1, pp.396-399, January-February-2018.