A Deterministic Seeding Approach for k-means Clustering

Omar Kettani

doi:10.32628/CSEIT217246

Authors

Omar Kettani Scientific Institute, Mohammed V University, Rabat, Morocco

DOI:

https://doi.org/10.32628/CSEIT217246

Keywords:

clustering, k-means, initialization, KKZ, silhouette

Abstract

In this work, a simple and efficient approach is proposed to initialize the k-means clustering algorithm. The complexity of this method is O(nk), where n is the number of data and k the number of clusters. Performance evaluation was done by applying this approach on various benchmark datasets and comparing with the related deterministic KKZ seed algorithm. Experimental results have demonstrated that this approach produces more consistent clustering results in term of average silhouette index.

References

Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 − 1035 (2007).
Asuncion, A. and Newman, D.J. (2007). UCI Machine Learning Repository http://www.ics.uci.edu/~mlearn/MLRepository.htmlIrvine, CA: University of California, School of Information and Computer Science
Katsavounidis I., Jay Kuo C. C., and Zhang Z. 1994 . A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters, vol. 1, pp. 144–146, Oct. 1994.
Kaufman L. and Rousseeuw P. , 2005 Finding groups in data: an introduction to cluster analysis. Wiley.
Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.

A Deterministic Seeding Approach for k-means Clustering

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite