Auto Determination of K in KMEANS with MAP-REDUCE for Numerical and Text Datasets

Authors

  • K. P. Shiudkar  ME CSE Student, Bharati Vidyapeeth College of Engineering, Kolhapur, Maharashtra, India
  • Prof. S. B. Takmare  Assistant Professor, Department of CSE, A P Shah Institute of Technology Thane, Maharashtra, India
  • Prof. R. P. Mirajkar  Assistant Professor, Department of CSE, Bharati Vidyapeeth college of Engineering Kolhapur, Maharashtra, India

Keywords:

Initial Centroids, Clustering, Data mining, Data sets, K-means clustering, Map-Reduce.

Abstract

Data mining is the process of automatically discovering useful information in large datasets. Clustering analysis is a very important branch in data mining. Cluster analysis based on the data objects and their relationships and grouping of data objects. Clustering very large datasets is a challenging problem for data mining and processing. Map Reduce is considered as a powerful programming framework, which significantly reduces executing time by dividing a job into several tasks, and executes them in a distributed environment. K-Means, which is one of the most used clustering methods, and K-Means based on Map Reduce is considered as an advanced solution for very large dataset clustering. However, the executing time is still an obstacle due to the increasing number of iterations when there is an increase of dataset size and number of clusters. The traditional k-means is computationally expensive, sensitive to outliers and has an unstable result hence its inefficiency when dealing with very large datasets. Solving these issues is the subject of much recent research work. In this paper, we propose an Auto determination of K in KMEANS with MAP-REDUCE for numerical and text datasets in order to adapt it to handle large-scale datasets by reducing its execution time. In addition, we proposed algorithms to find the initial centroids automatically and cluster are formed on both numerical and text both datasets.

References

  1. Amira Boukhdhir , Oussama Lachiheb , Mohamed Salah Gouider. “An Improved Map Reduce Design of Kmeans for clustering very large datasets”, Published in IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA)(2015)
  2. V. Duon, M. Phayung. ”Fast K-Means Clustering for very large datasets based on Map Reduce Combined with New Cutting Method (FMR. KMeans)”, Springer International Publishing Switzerland, 2015.K. Elissa.
  3. M. Li and al. “An improved k-means algorithm based on Map reduce and Grid”, International Journal of Grid Distribution Computing, (2015)
  4. C. Xiaoli and al. “Optimized big data K-means clustering using Map Reduce”, Springer Science + Business Media New York (2014).
  5. Thibault Debatty, Pietro Michiardi, Wim Mees, Olivier Thonnard, “Determining the K in KMEANS with Map Reduce” ,Published in the Workshop Proceedings of the EDBT/ICDT 2014 Joint Conference (March 28, 2014, Athens, Greece) on CEUR-WS.org (ISSN 1613-0073).
  6. Document Clustering Using Improved K-Means Algorithm Shreyata khatri1,Dr. Kanwal Garg2 Research scholar,DCSA, Kurukshetra university,kurukshetra Assistant professor, DCSA Kurukshetra University, kurukshetra
  7. K-means Clustering Optimization Algorithm Based on MapReduce Zhihua Li1,a, Xudong Song,b,WenhuiZhu,YanxiaChen,International symposium on Computers & Informatics (ISCI 2015)

Downloads

Published

2018-07-30

Issue

Section

Research Articles

How to Cite

[1]
K. P. Shiudkar, Prof. S. B. Takmare, Prof. R. P. Mirajkar, " Auto Determination of K in KMEANS with MAP-REDUCE for Numerical and Text Datasets, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 6, pp.123-127, July-August-2018.