Auto Determination of K in KMEANS with MAP-REDUCE for Numerical and Text Datasets

Authors(3) :-K. P. Shiudkar, Prof. S. B. Takmare, Prof. R. P. Mirajkar

Data mining is the process of automatically discovering useful information in large datasets. Clustering analysis is a very important branch in data mining. Cluster analysis based on the data objects and their relationships and grouping of data objects. Clustering very large datasets is a challenging problem for data mining and processing. Map Reduce is considered as a powerful programming framework, which significantly reduces executing time by dividing a job into several tasks, and executes them in a distributed environment. K-Means, which is one of the most used clustering methods, and K-Means based on Map Reduce is considered as an advanced solution for very large dataset clustering. However, the executing time is still an obstacle due to the increasing number of iterations when there is an increase of dataset size and number of clusters. The traditional k-means is computationally expensive, sensitive to outliers and has an unstable result hence its inefficiency when dealing with very large datasets. Solving these issues is the subject of much recent research work. In this paper, we propose an Auto determination of K in KMEANS with MAP-REDUCE for numerical and text datasets in order to adapt it to handle large-scale datasets by reducing its execution time. In addition, we proposed algorithms to find the initial centroids automatically and cluster are formed on both numerical and text both datasets.

Authors and Affiliations

K. P. Shiudkar
ME CSE Student, Bharati Vidyapeeth College of Engineering, Kolhapur, Maharashtra, India
Prof. S. B. Takmare
Assistant Professor, Department of CSE, A P Shah Institute of Technology Thane, Maharashtra, India
Prof. R. P. Mirajkar
Assistant Professor, Department of CSE, Bharati Vidyapeeth college of Engineering Kolhapur, Maharashtra, India

Initial Centroids, Clustering, Data mining, Data sets, K-means clustering, Map-Reduce.

  1. Amira Boukhdhir , Oussama Lachiheb , Mohamed Salah Gouider. “An Improved Map Reduce Design of Kmeans for clustering very large datasets”, Published in IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA)(2015)
  2. V. Duon, M. Phayung. ”Fast K-Means Clustering for very large datasets based on Map Reduce Combined with New Cutting Method (FMR. KMeans)”, Springer International Publishing Switzerland, 2015.K. Elissa.
  3. M. Li and al. “An improved k-means algorithm based on Map reduce and Grid”, International Journal of Grid Distribution Computing, (2015)
  4. C. Xiaoli and al. “Optimized big data K-means clustering using Map Reduce”, Springer Science + Business Media New York (2014).
  5. Thibault Debatty, Pietro Michiardi, Wim Mees, Olivier Thonnard, “Determining the K in KMEANS with Map Reduce” ,Published in the Workshop Proceedings of the EDBT/ICDT 2014 Joint Conference (March 28, 2014, Athens, Greece) on (ISSN 1613-0073).
  6. Document Clustering Using Improved K-Means Algorithm Shreyata khatri1,Dr. Kanwal Garg2 Research scholar,DCSA, Kurukshetra university,kurukshetra Assistant professor, DCSA Kurukshetra University, kurukshetra
  7. K-means Clustering Optimization Algorithm Based on MapReduce Zhihua Li1,a, Xudong Song,b,WenhuiZhu,YanxiaChen,International symposium on Computers & Informatics (ISCI 2015)

Publication Details

Published in : Volume 3 | Issue 6 | July-August 2018
Date of Publication : 2018-07-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 123-127
Manuscript Number : CSEIT1183627
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

K. P. Shiudkar, Prof. S. B. Takmare, Prof. R. P. Mirajkar, "Auto Determination of K in KMEANS with MAP-REDUCE for Numerical and Text Datasets", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 6, pp.123-127, July-August.2018

Follow Us

Contact Us