Novel way of finding initial means in k-means clustering and validation using WEKA

Amit Mithal; Rohit Mittal

doi:10.32628/CSEIT1725162

Authors

Amit Mithal Department of Computer Science & Engineering, Jaipur Engineering College & Research Centre, Jaipur, Rajasthan, India
Rohit Mittal Department of Computer Science & Engineering, Arya College of Engineering & Information Technology, Jaipur, Rajasthan, India

Keywords:

Clustering, k-means, Weka, Iris

Abstract

The work proposes a novel choice for the randomly chosen initial means in the k-means clustering. The dataset used for the implementations and validations is the Iris flowers dataset, which contains 150 labeled instances on 5 attributes of the three Iris species. In the k-means clustering, to find the proposed initial means, certain objects are found and eliminated in the clustering, which are very far away from the rest of the objects in their respective clusters. The centroid values of these reduced k clusters are then taken as the initial means in the k-means clustering. The results have shown that the number of iterations required by the algorithm is significantly lesser using the proposed initial chosen means.

References

Paul S. Bradley, Usama M. Fayyad “Refining Initial Points for K-Means Clustering" Microsoft Research, May 1998, Technical Report, MSR-TR-98-36
Zhang Chen, Xia Shixiong “K-means Clustering Algorithm with improved Initial Center" Second International Workshop on Knowledge Discovery and Data Mining, 2009 IEEE
Zhe Zhang, Junxi Zhang, HuifengXue “Improved K-means Clustering Algorithm" 2008 Congress on Image and Signal Processing, IEEE
Wei Zhong, Gulsah Altun, Robert Harrison, Phang C. Tai, and Yi Pan “Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property" IEEE Transactions on Nanobio-Science, vol. 4, no. 3, September 2005
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu “An Efficient K-Means Clustering Algorithm: Analysis and Implementation" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002
Ines Faerber, Stephan Guennemann, Hans-Peter Kriegel, Peer Kroeger, Emmanuel Mueller, Erich Schubert, Thomas Seidl, Arthur Zimek “On Using Class-Labels in Evaluation of Clusterings" 2010 ACM

Novel way of finding initial means in k-means clustering and validation using WEKA

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite