Novel way of finding initial means in k-means clustering and validation using WEKA

Authors(2) :-Amit Mithal, Rohit Mittal

The work proposes a novel choice for the randomly chosen initial means in the k-means clustering. The dataset used for the implementations and validations is the Iris flowers dataset, which contains 150 labeled instances on 5 attributes of the three Iris species. In the k-means clustering, to find the proposed initial means, certain objects are found and eliminated in the clustering, which are very far away from the rest of the objects in their respective clusters. The centroid values of these reduced k clusters are then taken as the initial means in the k-means clustering. The results have shown that the number of iterations required by the algorithm is significantly lesser using the proposed initial chosen means.

Authors and Affiliations

Amit Mithal
Department of Computer Science & Engineering, Jaipur Engineering College & Research Centre, Jaipur, Rajasthan, India
Rohit Mittal
Department of Computer Science & Engineering, Arya College of Engineering & Information Technology, Jaipur, Rajasthan, India

Clustering, k-means, Weka, Iris

  1. Paul S. Bradley, Usama M. Fayyad “Refining Initial Points for K-Means Clustering" Microsoft Research, May 1998, Technical Report, MSR-TR-98-36
  2. Zhang Chen, Xia Shixiong “K-means Clustering Algorithm with improved Initial Center" Second International Workshop on Knowledge Discovery and Data Mining, 2009 IEEE
  3. Zhe Zhang, Junxi Zhang, HuifengXue “Improved K-means Clustering Algorithm" 2008 Congress on Image and Signal Processing, IEEE
  4. Wei Zhong, Gulsah Altun, Robert Harrison, Phang C. Tai, and Yi Pan “Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property" IEEE Transactions on Nanobio-Science, vol. 4, no. 3, September 2005
  5. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu “An Efficient K-Means Clustering Algorithm: Analysis and Implementation" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002
  6. Ines Faerber, Stephan Guennemann, Hans-Peter Kriegel, Peer Kroeger, Emmanuel Mueller, Erich Schubert, Thomas Seidl, Arthur Zimek “On Using Class-Labels in Evaluation of Clusterings" 2010 ACM

Publication Details

Published in : Volume 2 | Issue 5 | September-October 2017
Date of Publication : 2017-10-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 704-708
Manuscript Number : CSEIT1725162
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Amit Mithal, Rohit Mittal, "Novel way of finding initial means in k-means clustering and validation using WEKA", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.704-708, September-October-2017.
Journal URL : http://ijsrcseit.com/CSEIT1725162

Article Preview