Survey on Clustering High-Dimensional data using Hubness

Authors(2) :-Miss. Archana Chaudahri, Mr. Nilesh Vani

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion.

Authors and Affiliations

Miss. Archana Chaudahri
ME Scholar, Computer Engineering, GF's GCOE, Jalgaon, Jalgoan, Maharashtra, India
Mr. Nilesh Vani
Assistant Professor, Computer Engineering, GF's GCOE, Jalgon, Jalgoan, Maharashtra, India

Hubness, Clustering Methods, Datamining Techniques

  1. Nenad T., Milos R., Dunja M., and Mirjana I., “The Role of Hubness in Clustering High-Dimensional Data” IEEE Transactions On Knowledge And Data Engineering, Vol. 26, No. 3, March 2014
  2. C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data, pp. 70-81, 2000.
  3. K. Kailing, H.-P. Kriegel, P. Kro¨ger, and S. Wanka, “Ranking Interesting Subspaces for Clustering High Dimensional Data,” Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 241-252, 2003.
  4. K. Kailing, H.-P. Kriegel, and P. Kro¨ger, “Density-Connected Subspace Clustering for High-Dimensional Data,” Proc. Fourth SIAM Int’l Conf. Data Mining (SDM), pp. 246-257, 2004.
  5. E. Mu¨ller, S. Gu¨nnemann, I. Assent, and T. Seidl, “Evaluating Clustering in Subspace Projections of High Dimensional Data,” Proc. VLDB Endowment, vol. 2, pp. 1270-1281, 2009
  6. Weber R., Schek H.-J., Blott S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB Conference Proceedings, 1998.
  7. Ergun Bic¸ici and Deniz Yure, “Locally Scaled Density Based Clustering”, Proc. Eighth Int’l Conf. Adaptive and Natural Computing Algorithms (ICANNGA), Part I, pp. 739-748, 2007
  8. N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic, “Hubness-Based Fuzzy Measures for High-Dimensional kNearest Neighbor Classification,” Proc. Seventh Int’l Conf. Machine Learning and Data Mining (MLDM), pp. 16-30, 2011.
  9. N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic, “A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian kNN,” Proc. 20th ACM Int’l Conf. Information and Knowledge Management (CIKM), pp. 2173-2176, 2011.
  10. M. Radovanovic, A. Nanopoulos, and M. Ivanovic, “Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data,” J. Machine Learning Research, vol. 11, pp. 2487-2531, 2010.
  11. N. Tomasev, R. Brehar, D. Mladenic, and S. Nedevschi, “The Influence of Hubness on Nearest-Neighbor Methods in Object Recognition,” Proc. IEEE Seventh Int’l Conf. Intelligent Computer Comm. and Processing (ICCP), pp. 367-374, 2011.
  12. C. Fowlkes, S. Belongie, F. Chung, and J. Malik, “Spectral grouping using the nystr¨om method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 214-225, 2004.
  13. M. Li, J. T. Kwok, and B. L. Lu, “Making large-scale nystrm approximation possible,” in Proceeding of 27th International Conference on Machine Learning, pp. 631-638, 2010.
  14. D. Yan, L. Huang, and M. I. Jordan, “Fast approximate spectral clustering,” in Proceeding of 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 907-916, 2009.
  15. H. Shinnou and M. Sasaki, “Spectral clustering for a large data set by reducing the similarity matrix size,” in Proceeding of International Conference on Language Resources and Evaluation, pp. 201-204, 2008.
  16. Nenad Tomašev, Miloš Radovanovi´c, Dunja Mladeni´c, A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian Knn”, CIKM’11 , Glasgow, Scotland, UK, 24-28, October 2011.
  17. Thomas Low1, Christian Borgelt, Sebastian Stober, and Andreas N¨urnberger, “The Hubness Phenomenon: Fact or Artifact?” , Studies in Fuzziness and Soft Computing, 267-278, January 2013
  18. Franc¸ois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7) 873-886, 2007
  19. Radovanovi´c, M., Nanopoulos, A., Ivanovi´c, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11, 2487-2531,2010

Publication Details

Published in : Volume 6 | Issue 1 | January-February 2020
Date of Publication : 2020-01-05
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 01-07
Manuscript Number : CSEIT195671
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Miss. Archana Chaudahri, Mr. Nilesh Vani, "Survey on Clustering High-Dimensional data using Hubness", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 6, Issue 1, pp.01-07, January-February-2020. Available at doi : https://doi.org/10.32628/CSEIT195671
Journal URL : http://ijsrcseit.com/CSEIT195671

Article Preview