Guidance to Data Mining in Python

Authors

  • Aashish Mamgain  CSE, HMR Institute of Technology and Management, New Delhi, India

Keywords:

Python, Data Mining, Classification, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression

Abstract

Python has become top programming language in the field of data mining in recent years. Around 45% of data scientists are using python programming language for data mining. Python is ahead from other analytical tools such as R. Data mining is the technique in which large datasets is analyzed for generating predictive patterns, information. Data mining is used to detect various applications such as marketing, medical, telecommunications and so on. This paper presents classification algorithms such as Random Forest, Support Vector Machine, Decision Tree, Logistic Regression etc. This guide provides data mining classification techniques in python programming language.

References

  1. Andreas C. Müller & Sarah Guido. 2016. Introduction to Machine Learning with Python. O’Reilly Media.
  2. Gavin Hackeling. 2014. Mastering Machine Learning with scikit-learn. Packt Publishing.
  3.  Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort. (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 2825-2830.
  4. Eli Bressert. 2012. SciPy and NumPy. O’Reilly Media.
  5. Ren, J., Lee, S. D., Chen, X., Kao, B., Cheng, R., & Cheung, D. (2009, December). Naive bayes classification of uncertain data. In Data Mining, 2009. ICDM'09. Ninth IEEE International Conference on (pp. 944-949). IEEE.
  6. Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008, July). Query dependent ranking using k-nearest neighbor. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 115-122). ACM
  7. Zhang, X., & Yang, L. (2012). Improving SVM through a risk decision rule running on MATLAB. Journal of Software, 7(10), 2252- 2257
  8. Moosavian, A., Ahmadi, H., & Tabatabaeefar, A. (2012). Journal-bearing fault detection based on vibration analysis using feature selection and classification techniques.
  9. Taniar, D., & Rahayu, W. (2013). A taxonomy for nearest neighbour queries in spatial databases. Journal of Computer and System Sciences.
  10. Karatsiolis, S., & Schizas, C. N. (2012, November). Region based Support Vector Machine algorithm for medical diagnosis on Pima Indian Diabetes dataset. In Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on (pp. 139-144). IEEE
  11. Eesha Goel, Er. Abhilasha (2017, January). Random Forest: A Review, International Journal of Advanced Research in Computer Science and Software Engineering, ISSN: 2277 128X DOI: 10.23956/ijarcsse/V7I1/01113
  12. Seema Sharma, Jitendra Agrawal , Shikha Agarwal , Sanjeev Sharma (2013). Machine Learning Techniques for Data Mining: A Survey, 2013 IEEE International Conference on Computational Intelligence and Computing Research, 978-1-4799-1597-2/13
  13. Shi Na, Guan yong, Liu Xumin (2010). Research on k-means Clustering algorithm: An Improved k-means Clustering Algorithm, Third International Symposium on Intelligent Information Technology and Security Informatics IEEE, ISSN: 978-0-7695-4020-7/10 DOI 10.1109/IITSI.2010.74
  14. Fahim A M,Salem A M,Torkey F A, “An efficient enhanced k-means clustering algorithm” Journal of Zhejiang University Science A, Vol.10, pp:1626-1633,July 2006.
  15. Ms. Sonali. B. Maind, Ms. Priyanka Wankar (2014, January), Research Paper on Basic of Artificial Neural Network. International Journal on Recent and Innovation Trends in Computing and Communication. ISSN: 2321-8169
  16.  Abhishek Gupta, Ankit Jain, Samartha Yadav. (2018). Literature survey on detection of web attacks using machine learning. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. 2456-3307.

Downloads

Published

2018-08-30

Issue

Section

Research Articles

How to Cite

[1]
Aashish Mamgain, " Guidance to Data Mining in Python, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 6, pp.596-601, July-August-2018.