Survey on Clustering High-Dimensional data using Hubness
DOI:
https://doi.org/10.32628/CSEIT195671Keywords:
Hubness, Clustering Methods, Datamining TechniquesAbstract
Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion.
References
- Nenad T., Milos R., Dunja M., and Mirjana I., “The Role of Hubness in Clustering High-Dimensional Data” IEEE Transactions On Knowledge And Data Engineering, Vol. 26, No. 3, March 2014
- C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data, pp. 70-81, 2000.
- K. Kailing, H.-P. Kriegel, P. Kro¨ger, and S. Wanka, “Ranking Interesting Subspaces for Clustering High Dimensional Data,” Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 241-252, 2003.
- K. Kailing, H.-P. Kriegel, and P. Kro¨ger, “Density-Connected Subspace Clustering for High-Dimensional Data,” Proc. Fourth SIAM Int’l Conf. Data Mining (SDM), pp. 246-257, 2004.
- E. Mu¨ller, S. Gu¨nnemann, I. Assent, and T. Seidl, “Evaluating Clustering in Subspace Projections of High Dimensional Data,” Proc. VLDB Endowment, vol. 2, pp. 1270-1281, 2009
- Weber R., Schek H.-J., Blott S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB Conference Proceedings, 1998.
- Ergun Bic¸ici and Deniz Yure, “Locally Scaled Density Based Clustering”, Proc. Eighth Int’l Conf. Adaptive and Natural Computing Algorithms (ICANNGA), Part I, pp. 739-748, 2007
- N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic, “Hubness-Based Fuzzy Measures for High-Dimensional kNearest Neighbor Classification,” Proc. Seventh Int’l Conf. Machine Learning and Data Mining (MLDM), pp. 16-30, 2011.
- N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic, “A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian kNN,” Proc. 20th ACM Int’l Conf. Information and Knowledge Management (CIKM), pp. 2173-2176, 2011.
- M. Radovanovic, A. Nanopoulos, and M. Ivanovic, “Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data,” J. Machine Learning Research, vol. 11, pp. 2487-2531, 2010.
- N. Tomasev, R. Brehar, D. Mladenic, and S. Nedevschi, “The Influence of Hubness on Nearest-Neighbor Methods in Object Recognition,” Proc. IEEE Seventh Int’l Conf. Intelligent Computer Comm. and Processing (ICCP), pp. 367-374, 2011.
- C. Fowlkes, S. Belongie, F. Chung, and J. Malik, “Spectral grouping using the nystr¨om method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 214-225, 2004.
- M. Li, J. T. Kwok, and B. L. Lu, “Making large-scale nystrm approximation possible,” in Proceeding of 27th International Conference on Machine Learning, pp. 631-638, 2010.
- D. Yan, L. Huang, and M. I. Jordan, “Fast approximate spectral clustering,” in Proceeding of 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 907-916, 2009.
- H. Shinnou and M. Sasaki, “Spectral clustering for a large data set by reducing the similarity matrix size,” in Proceeding of International Conference on Language Resources and Evaluation, pp. 201-204, 2008.
- Nenad Tomašev, Miloš Radovanovi´c, Dunja Mladeni´c, A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian Knn”, CIKM’11 , Glasgow, Scotland, UK, 24-28, October 2011.
- Thomas Low1, Christian Borgelt, Sebastian Stober, and Andreas N¨urnberger, “The Hubness Phenomenon: Fact or Artifact?” , Studies in Fuzziness and Soft Computing, 267-278, January 2013
- Franc¸ois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7) 873-886, 2007
- Radovanovi´c, M., Nanopoulos, A., Ivanovi´c, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11, 2487-2531,2010
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.