Systematic Component Clustering Scheme for Collective Objects through Micro - Clusters

M. Shailaja; Dr. S. Vijay Bhanu

doi:10.32628/CSEIT1831201

Authors

M. Shailaja Ph.D Scholor, Department of Computer Science And Engineering, Annamalai University, Annamalai Nagar,Chidambaram,Tamilnadu, India
Dr. S. Vijay Bhanu Professor, Department of Computer Science And Engineering, Annamalai University, Annamalai Nagar,Chidambaram,Tamilnadu, India

Keywords:

Data mining, data stream clustering, density-based clustering. Information theory, Feature Clustering, Classification, Entropy, Kullback-Leibler Divergence, Mutual Information, Jensen-Shannon Divergence.

Abstract

We extend and assess another procedure to address this problem for miniaturized scale bunch essentially based calculations. We present the idea of a common thickness chart which expressly catches the thickness of the one of a kind data between small-scale bunches for the length of grouping after which indicate how the diagram might be utilized for reclustering miniaturized scale groups. This is a particular approach on account that fairly on relying on presumptions about the dissemination of records directs doled out toward a microcluster (frequently a Gaussian dispersion cycle a center), it appraises the thickness in the mutual area among microclusters immediately from the records. To the top notch of our understanding, this paper is the first to propose and explore utilizing a common thickness principally based reclustering procedure for records course grouping. In this paper, we advocate a fresh out of the plastic new information-theoretic troublesome calculation for work/state bunching and utilize it on content sort. Existing strategies for such "distributional bunching" of words are agglomerative in nature and result in (I) sub-best word bunches and (ii) high computational expense. With a specific end goal to expressly catch the optimality of word groups in an certainties theoretic system, we initially determine a universal standard for work grouping. We at that point blessing a speedy, disruptive arrangement of tenets that monotonically diminishes this objective trademark expense. We show that our arrangement of tenets limits "within bunch Jensen-Shannon dissimilarity" in the meantime as at the same time boosting the "between-group Jensen-Shannon uniqueness". As opposed to the beforehand proposed agglomerative techniques our troublesome arrangement of standards is significantly quicker and accomplishes similar or higher class correctnesses. We additionally show that element grouping is a viable approach for building littler style models in the progressive sort. We show unmistakable trial impacts the use of Naive Bayes and Support Vector Machines at the 20Newsgroups records set and a three-level progressive system of HTML documents amassed from the Open Directory challenge.

References

IEEE Standard for Binary Floating Point Arithmetic. ANSI/IEEE, New York, Std 754-1985 edition, 1985.
L. D. Baker and A. McCallum. Distributional clustering of words for text classification. In SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR, pages 96-103. ACM, August 1998.
R. Bekkerman, R. El-Yaniv, Y. Winter, and N. Tishby. On feature distributional clustering for text categorization. In ACM SIGIR, pages 146-153, 2001.
P. Berkhin and J. D. Becher. Learning simple relations: Theory and applications. In Proceedings of the The Second SIAM International Conference on Data Mining, pages 420-436, 2002.
B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In COLT, pages 144-152, 1992. P. S. Bradley and O. L. Mangasarian. k-plane clustering. Journal of Global Optimization, 16(1):23-32, 2000.
S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Using taxonomy, discriminants, and signatures for navigating in text databases. In Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, USA, 1991.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990.

Systematic Component Clustering Scheme for Collective Objects through Micro - Clusters

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite