Hybrid SVD For Document Representation Using Different Vectorization

Authors

  • Kalpana P  Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Technology1, Coimbatore, Tamil Nadu, India
  • Rosini B R  UG Scholar, Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
  • Sathya Priya K P  UG Scholar, Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
  • Sowmiya S  UG Scholar, Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India

DOI:

https://doi.org//10.32628/CSEIT195260

Keywords:

SVD, Kmeans, DBSCAN, CURE

Abstract

Document Clustering is the process of segmenting a particular collection of text into subgroups. Nowadays all documents are in electronic form, because of the issue to retrieve relevant document from the large database. The goal is to transform text composed of daily language in a structured, database format. In this way, different documents are summarized and presented in a uniform manner. The challenging problem of document clustering are big volume, high dimensionality and complex semantics. The objective of this paper is mainly focused on clustering multi-sense word embeddings using three different algorithms(K-means, DBSCAN, CURE). Among these three algorithm CURE gives better accuracy and it can handle large databases efficiently.

References

  1. Rui Zhao and Kezhi Mao “Fuzzy Bag Of Words Model for document Representation” in IEEE transaction on Fuzzy System , vol.26,No. 2, April 2018.
  2. Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu” Dimensionality Reduction for k-Means Clustering and Low Rank Approximation” April 2015.
  3. R.janai and Dr. S.Vijayarani “An Efficient Algorithm for document Clustering in Information Retrieval ” Vol 4,Isuue XII, December 2016.
  4. Michal Aharon, Michael Elad, and Alfred Bruckstein“K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation” IEEE Transaction On Signal Processing, Vol, 54, No. 11, November 2006
  5. Paul S. Bradely and Usama M. Fayyad “Initial points for K-mean Clustering”. In Proceedings of the 15th International Conference onMachine Learning(ICML98),1998.
  6. Kiri Wagsta, Claire Cardie, Seth Rogers, Stefan Scroed “Constrained K-means Clustering with Background Knowledge.” Proceedings of the Eighteenth International Conference on Machine Learning, pages 577584, 2001.
  7. A.Hotho, S.Staab and G.Stumme,”Wordnet improves text document clustering” In Proceedings of the SIGIR Semantic Web Workshop, Toronto, 2003.
  8. A.K.Jain, M.N.Murty and P.JFlynn. Dataclustering:Review. ACMcomputer surveys(CSUR,31(3):264-323,1999).
  9. Bjornar Larsen and Chinatsu Aone ”Fast and Effective Text Mining Using Linear-time” In Proceedings of the fifth ACM SICKDD International Conference on knowledge Discovery and Data Mining,1999.
  10. D.D Lewis Reuters-21578 “Text Categorization text collection distribution” In proceedings of ACM SIGKIDD on 1999.
  11. Lin “Divergence measures based on the Shannon entropy”, IEEE transaction On information theory, 37(1):145-151-1991.
  12. D. Arthur and S. Vassilvitsku” K-means – the advantage of careful seedings”. In symposium on discerete algotithm, 2007.
  13. D.Milne, O.Medelyan, and I.H.Witten. Mining domain-specfic thesauri from Wikipedia: A case study. In Proc. Of the International Conference on Web Intelligence (IEEE/WIC/ACM WI’2006),2006.

Downloads

Published

2019-04-30

Issue

Section

Research Articles

How to Cite

[1]
Kalpana P, Rosini B R, Sathya Priya K P, Sowmiya S, " Hybrid SVD For Document Representation Using Different Vectorization, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 5, Issue 2, pp.388-393, March-April-2019. Available at doi : https://doi.org/10.32628/CSEIT195260