Survey on Clustering High-Dimensional data using Hubness

Miss. Archana Chaudahri; Mr. Nilesh Vani

doi:10.32628/CSEIT195671

Authors

Miss. Archana Chaudahri ME Scholar, Computer Engineering, GF's GCOE, Jalgaon, Jalgoan, Maharashtra, India
Mr. Nilesh Vani Assistant Professor, Computer Engineering, GF's GCOE, Jalgon, Jalgoan, Maharashtra, India

DOI:

https://doi.org/10.32628/CSEIT195671

Keywords:

Hubness, Clustering Methods, Datamining Techniques

Abstract

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion.

References

Nenad T., Milos R., Dunja M., and Mirjana I., “The Role of Hubness in Clustering High-Dimensional Data” IEEE Transactions On Knowledge And Data Engineering, Vol. 26, No. 3, March 2014
C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data, pp. 70-81, 2000.
K. Kailing, H.-P. Kriegel, P. Kro¨ger, and S. Wanka, “Ranking Interesting Subspaces for Clustering High Dimensional Data,” Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 241-252, 2003.
K. Kailing, H.-P. Kriegel, and P. Kro¨ger, “Density-Connected Subspace Clustering for High-Dimensional Data,” Proc. Fourth SIAM Int’l Conf. Data Mining (SDM), pp. 246-257, 2004.
E. Mu¨ller, S. Gu¨nnemann, I. Assent, and T. Seidl, “Evaluating Clustering in Subspace Projections of High Dimensional Data,” Proc. VLDB Endowment, vol. 2, pp. 1270-1281, 2009
Weber R., Schek H.-J., Blott S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB Conference Proceedings, 1998.
Ergun Bic¸ici and Deniz Yure, “Locally Scaled Density Based Clustering”, Proc. Eighth Int’l Conf. Adaptive and Natural Computing Algorithms (ICANNGA), Part I, pp. 739-748, 2007
N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic, “Hubness-Based Fuzzy Measures for High-Dimensional kNearest Neighbor Classification,” Proc. Seventh Int’l Conf. Machine Learning and Data Mining (MLDM), pp. 16-30, 2011.
N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic, “A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian kNN,” Proc. 20th ACM Int’l Conf. Information and Knowledge Management (CIKM), pp. 2173-2176, 2011.
M. Radovanovic, A. Nanopoulos, and M. Ivanovic, “Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data,” J. Machine Learning Research, vol. 11, pp. 2487-2531, 2010.
N. Tomasev, R. Brehar, D. Mladenic, and S. Nedevschi, “The Influence of Hubness on Nearest-Neighbor Methods in Object Recognition,” Proc. IEEE Seventh Int’l Conf. Intelligent Computer Comm. and Processing (ICCP), pp. 367-374, 2011.
C. Fowlkes, S. Belongie, F. Chung, and J. Malik, “Spectral grouping using the nystr¨om method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 214-225, 2004.
M. Li, J. T. Kwok, and B. L. Lu, “Making large-scale nystrm approximation possible,” in Proceeding of 27th International Conference on Machine Learning, pp. 631-638, 2010.
D. Yan, L. Huang, and M. I. Jordan, “Fast approximate spectral clustering,” in Proceeding of 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 907-916, 2009.
H. Shinnou and M. Sasaki, “Spectral clustering for a large data set by reducing the similarity matrix size,” in Proceeding of International Conference on Language Resources and Evaluation, pp. 201-204, 2008.
Nenad Tomašev, Miloš Radovanovi´c, Dunja Mladeni´c, A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian Knn”, CIKM’11 , Glasgow, Scotland, UK, 24-28, October 2011.
Thomas Low1, Christian Borgelt, Sebastian Stober, and Andreas N¨urnberger, “The Hubness Phenomenon: Fact or Artifact?” , Studies in Fuzziness and Soft Computing, 267-278, January 2013
Franc¸ois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7) 873-886, 2007
Radovanovi´c, M., Nanopoulos, A., Ivanovi´c, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11, 2487-2531,2010

Survey on Clustering High-Dimensional data using Hubness

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite