A Deterministic Seeding Approach for k-means Clustering

Authors

  • Omar Kettani  Scientific Institute, Mohammed V University, Rabat, Morocco

DOI:

https://doi.org/10.32628/CSEIT217246

Keywords:

clustering, k-means, initialization, KKZ, silhouette

Abstract

In this work, a simple and efficient approach is proposed to initialize the k-means clustering algorithm. The complexity of this method is O(nk), where n is the number of data and k the number of clusters. Performance evaluation was done by applying this approach on various benchmark datasets and comparing with the related deterministic KKZ seed algorithm. Experimental results have demonstrated that this approach produces more consistent clustering results in term of average silhouette index.

References

  1. Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
  2. Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 − 1035 (2007).
  3. Asuncion, A. and Newman, D.J. (2007). UCI Machine Learning Repository http://www.ics.uci.edu/~mlearn/MLRepository.htmlIrvine, CA: University of California, School of Information and Computer Science
  4. Katsavounidis I., Jay Kuo C. C., and Zhang Z. 1994 . A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters, vol. 1, pp. 144–146, Oct. 1994.
  5. Kaufman L. and Rousseeuw P. , 2005 Finding groups in data: an introduction to cluster analysis. Wiley.
  6. Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.

Downloads

Published

2021-04-30

Issue

Section

Research Articles

How to Cite

[1]
Omar Kettani, " A Deterministic Seeding Approach for k-means Clustering" International Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 7, Issue 2, pp.192-195, March-April-2021. Available at doi : https://doi.org/10.32628/CSEIT217246