A Deterministic Seeding Approach for k-means Clustering
DOI:
https://doi.org/10.32628/CSEIT217246Keywords:
clustering, k-means, initialization, KKZ, silhouetteAbstract
In this work, a simple and efficient approach is proposed to initialize the k-means clustering algorithm. The complexity of this method is O(nk), where n is the number of data and k the number of clusters. Performance evaluation was done by applying this approach on various benchmark datasets and comparing with the related deterministic KKZ seed algorithm. Experimental results have demonstrated that this approach produces more consistent clustering results in term of average silhouette index.
References
- Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
- Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 − 1035 (2007).
- Asuncion, A. and Newman, D.J. (2007). UCI Machine Learning Repository http://www.ics.uci.edu/~mlearn/MLRepository.htmlIrvine, CA: University of California, School of Information and Computer Science
- Katsavounidis I., Jay Kuo C. C., and Zhang Z. 1994 . A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters, vol. 1, pp. 144–146, Oct. 1994.
- Kaufman L. and Rousseeuw P. , 2005 Finding groups in data: an introduction to cluster analysis. Wiley.
- Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.