Novel way of finding initial means in k-means clustering and validation using WEKA

Authors

  • Amit Mithal  Department of Computer Science & Engineering, Jaipur Engineering College & Research Centre, Jaipur, Rajasthan, India
  • Rohit Mittal  Department of Computer Science & Engineering, Arya College of Engineering & Information Technology, Jaipur, Rajasthan, India

Keywords:

Clustering, k-means, Weka, Iris

Abstract

The work proposes a novel choice for the randomly chosen initial means in the k-means clustering. The dataset used for the implementations and validations is the Iris flowers dataset, which contains 150 labeled instances on 5 attributes of the three Iris species. In the k-means clustering, to find the proposed initial means, certain objects are found and eliminated in the clustering, which are very far away from the rest of the objects in their respective clusters. The centroid values of these reduced k clusters are then taken as the initial means in the k-means clustering. The results have shown that the number of iterations required by the algorithm is significantly lesser using the proposed initial chosen means.

References

  1. Paul S. Bradley, Usama M. Fayyad “Refining Initial Points for K-Means Clustering" Microsoft Research, May 1998, Technical Report, MSR-TR-98-36
  2. Zhang Chen, Xia Shixiong “K-means Clustering Algorithm with improved Initial Center" Second International Workshop on Knowledge Discovery and Data Mining, 2009 IEEE
  3. Zhe Zhang, Junxi Zhang, HuifengXue “Improved K-means Clustering Algorithm" 2008 Congress on Image and Signal Processing, IEEE
  4. Wei Zhong, Gulsah Altun, Robert Harrison, Phang C. Tai, and Yi Pan “Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property" IEEE Transactions on Nanobio-Science, vol. 4, no. 3, September 2005
  5. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu “An Efficient K-Means Clustering Algorithm: Analysis and Implementation" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002
  6. Ines Faerber, Stephan Guennemann, Hans-Peter Kriegel, Peer Kroeger, Emmanuel Mueller, Erich Schubert, Thomas Seidl, Arthur Zimek “On Using Class-Labels in Evaluation of Clusterings" 2010 ACM

Downloads

Published

2017-10-31

Issue

Section

Research Articles

How to Cite

[1]
Amit Mithal, Rohit Mittal, " Novel way of finding initial means in k-means clustering and validation using WEKA, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.704-708, September-October-2017.