Hybrid SVD For Document Representation Using Different Vectorization

Authors(4) :-Kalpana P, Rosini B R, Sathya Priya K P, Sowmiya S

Document Clustering is the process of segmenting a particular collection of text into subgroups. Nowadays all documents are in electronic form, because of the issue to retrieve relevant document from the large database. The goal is to transform text composed of daily language in a structured, database format. In this way, different documents are summarized and presented in a uniform manner. The challenging problem of document clustering are big volume, high dimensionality and complex semantics. The objective of this paper is mainly focused on clustering multi-sense word embeddings using three different algorithms(K-means, DBSCAN, CURE). Among these three algorithm CURE gives better accuracy and it can handle large databases efficiently.

Authors and Affiliations

Kalpana P
Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Technology1, Coimbatore, Tamil Nadu, India
Rosini B R
UG Scholar, Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
Sathya Priya K P
UG Scholar, Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
Sowmiya S
UG Scholar, Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India

SVD, Kmeans, DBSCAN, CURE

  1. Rui Zhao and Kezhi Mao “Fuzzy Bag Of Words Model for document Representation” in IEEE transaction on Fuzzy System , vol.26,No. 2, April 2018.
  2. Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu” Dimensionality Reduction for k-Means Clustering and Low Rank Approximation” April 2015.
  3. R.janai and Dr. S.Vijayarani “An Efficient Algorithm for document Clustering in Information Retrieval ” Vol 4,Isuue XII, December 2016.
  4. Michal Aharon, Michael Elad, and Alfred Bruckstein“K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation” IEEE Transaction On Signal Processing, Vol, 54, No. 11, November 2006
  5. Paul S. Bradely and Usama M. Fayyad “Initial points for K-mean Clustering”. In Proceedings of the 15th International Conference onMachine Learning(ICML98),1998.
  6. Kiri Wagsta, Claire Cardie, Seth Rogers, Stefan Scroed “Constrained K-means Clustering with Background Knowledge.” Proceedings of the Eighteenth International Conference on Machine Learning, pages 577584, 2001.
  7. A.Hotho, S.Staab and G.Stumme,”Wordnet improves text document clustering” In Proceedings of the SIGIR Semantic Web Workshop, Toronto, 2003.
  8. A.K.Jain, M.N.Murty and P.JFlynn. Dataclustering:Review. ACMcomputer surveys(CSUR,31(3):264-323,1999).
  9. Bjornar Larsen and Chinatsu Aone ”Fast and Effective Text Mining Using Linear-time” In Proceedings of the fifth ACM SICKDD International Conference on knowledge Discovery and Data Mining,1999.
  10. D.D Lewis Reuters-21578 “Text Categorization text collection distribution” In proceedings of ACM SIGKIDD on 1999.
  11. Lin “Divergence measures based on the Shannon entropy”, IEEE transaction On information theory, 37(1):145-151-1991.
  12. D. Arthur and S. Vassilvitsku” K-means – the advantage of careful seedings”. In symposium on discerete algotithm, 2007.
  13. D.Milne, O.Medelyan, and I.H.Witten. Mining domain-specfic thesauri from Wikipedia: A case study. In Proc. Of the International Conference on Web Intelligence (IEEE/WIC/ACM WI’2006),2006.

Publication Details

Published in : Volume 5 | Issue 2 | March-April 2019
Date of Publication : 2019-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 388-393
Manuscript Number : CSEIT195260
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Kalpana P, Rosini B R, Sathya Priya K P, Sowmiya S, "Hybrid SVD For Document Representation Using Different Vectorization", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 5, Issue 2, pp.388-393, March-April-2019. Available at doi : https://doi.org/10.32628/CSEIT195260
Journal URL : http://ijsrcseit.com/CSEIT195260

Article Preview