A Review article on Semi- Supervised Clustering Framework for High Dimensional Data

Authors

  • M. Pavithra  Assistant Professor, Department of C.S.E, Jansons Institute of Technology, Coimbatore, India
  • Dr. R. M. S. Parvathi  Professor & Dean - PG Studies, Sri Ramakrishna Institute of Technology, Coimbatore, India

DOI:

https://doi.org//10.32628/CSEIT195410

Keywords:

Cluster Ensemble, Semi-Supervised Clustering, Clustering Analysis, High-Dimensional Data.

Abstract

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features [2]. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations [3]. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. Cluster formation has three types as supervised clustering, unsupervised clustering and semi supervised. This paper reviews traditional and state-of-the-art methods of clustering [1]. Clustering algorithms are based on active learning, with ensemble clustering-means algorithm, data streams with flock, fuzzy clustering for shape annotations, Incremental semi supervised clustering, Weakly supervised clustering, with minimum labeled data, self-organizing based on neural networks. Incremental semi-supervised clustering ensemble framework (ISSCE) which makes utilization of the advantage of the arbitrary subspace method, the limitation spread approach, the proposed incremental ensemble member choice process, and the normalized cut algorithm to perform high dimensional information clustering [4]. Semi-supervised clustering employs limited supervision in the form of labeled instances or pairwise instance constraints to aid unsupervised clustering and often significantly improves the clustering performance. Despite the vast amount of expert knowledge spent on this problem, most existing work is not designed for handling high-dimensional sparse data.

References

  1. S. Shalini, R.Raja, “An Improved Semi-Supervised Clustering Algorithm Based on Active Learning “, International Journal of Innovative Research in Computer and Communication Engineering Vol.2, Special Issue 1, March 2014.
  2. Ashraf Mohammed Iqba, Abidal rahman Moh’d, and Zahoor Ali Khan,”Semi-supervised Clustering Ensemble by Voting”, University, Halifax, Canada, 2010.
  3. Aloysius George, “ Efficient High Dimension Data Clustering using Constraint-Partitioning K-Means Algorithm”, The International Arab Journal of Information Technology, Vol. 10, No. 5, September 2013.
  4. Handl J, Knowles J, “On semi-supervised clustering via multi objective optimization”, Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO 2006); pp. 1465–1472, 2016.
  5. S. Basu, A. Banerjee, and R. J. Mooney, “Active semi-supervision for pairwise constrained clustering,” in Proc. SIAM Int. Conf. Data Mining, pp. 1–8, 2014.
  6. Tang W, Xiong H, Zhong S, Wu J, “ Enhancing semi-supervised clustering: a feature projection perspective”, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 707–716, 2015.
  7. Miyamoto S, Terami, “A. Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints”, Proceedings of the 2010 IEEE International Conference on Fuzzy Systems (FUZZ 2010), pp. 1–6, 2010.

Downloads

Published

2019-07-30

Issue

Section

Research Articles

How to Cite

[1]
M. Pavithra, Dr. R. M. S. Parvathi, " A Review article on Semi- Supervised Clustering Framework for High Dimensional Data, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 5, Issue 4, pp.102-108, July-August-2019. Available at doi : https://doi.org/10.32628/CSEIT195410