Efficiency of Clustering Data Streams Based on Micro-Clusters Shared Density

Authors

  • Avula Chitty  Department of CSE, Assistant Professor, Sri Indu College of Engineering And Technology, Hyderabad, Telangana, India

Keywords:

Data Mining, Data Stream Clustering, Density-Based Clustering

Abstract

As more and a lot of applications produce streaming information, clustering knowledge streams has become a very important technique for data and information engineering. A typical approach is to summarize the information stream in time with an online method into an oversized number of therefore known as micro-clusters. Micro-clusters represent native density estimates by aggregating {the information} of the many data points in an outlined area. On demand, a (modified) typical bunch formula is used in a very second offline step to recluster the micro-clusters into larger final clusters. For reclustering, the centers of the micro-clusters are used as pseudo points with the density estimates used as their weights. However, data concerning density within the area between micro-clusters isn't preserved within the on-line process and reclustering relies on probably inaccurate assumptions concerning the distribution of knowledge inside and between micro-clusters (e.g., uniform or Gaussian).This paper describes DBSTREAM, the primary micro-cluster-based on-line bunch part that expressly captures the density between micro-clusters via a shared density graph. The density data during this graph is then exploited for reclustering supported actual density between adjacent micro-clusters. We have a tendency to discuss the house and time complexness of maintaining the shared density graph. Experiments on a good vary of artificial and real knowledge sets highlight that mistreatment shared density improves bunch quality over alternative popular knowledge stream bunch ways that need the creation of a bigger variety of smaller micro-clusters to realize comparable results.

References

  1. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, "Clustering data streams," in Proceedings of the ACM Symposium on Foundations of Computer Science, 12-14 Nov. 2000, pp. 359–366.
  2. C. Aggarwal, Data Streams: Models and Algorithms, ser. Advances in Database Systems, Springer, Ed., 2007.
  3. J. Gama, Knowledge Discovery from Data Streams, 1st ed. Chapman & Hall/CRC, 2010.
  4. J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. P. L.F. d. Carvalho, and J. a. Gama, "Data stream clustering: A survey,"ACM Computing Surveys, vol. 46, no. 1, pp. 13:1–13:31, Jul. 2013.
  5. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in Proceedings of the International Conference on Very Large Data Bases (VLDB ’03), 2003, pp. 81–92.
  6. F. Cao, M. Ester, W. Qian, and A. Zhou, "Density-based clustering over an evolving data stream with noise," in Proceedings of the 2006 SIAM International Conference on Data Mining. SIAM, 2006,pp. 328–339.
  7. Y. Chen and L. Tu, "Density-based clustering for real-time stream data," in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY,USA: ACM, 2007, pp. 133–142.
  8. L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, "Densitybased clustering of data streams at multiple resolutions," ACM Transactions on Knowledge Discovery from Data, vol. 3, no. 3, pp.1–28, 2009.
  9. L. Tu and Y. Chen, "Stream data clustering based on grid density and attraction," ACM Transactions on Knowledge Discovery from Data, vol. 3, no. 3, pp. 1–27, 2009.
  10. L. Ertoz, M. Steinbach, and V. Kumar, "A new shared nearestneighbor clustering algorithm and its applications," in Workshopon Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining, 2002.
  11. P. Kranen, I. Assent, C. Baldauf, and T. Seidl, "The clustree:indexing micro-clusters for anytime stream mining," Knowledge and Information Systems, vol. 29, no. 2, pp. 249–272, 2011.
  12. A. Amini and T. Y. Wah, "Leaden-stream: A leader density based clustering algorithm over evolving data stream," Journal of Computer and Communications, vol. 1, no. 5, pp. 26–31, 2013.
  13. J. A. Hartigan, Clustering Algorithms, 99th ed. New York, NY,USA: John Wiley & Sons, Inc., 1975.
  14. E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, "A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data," in Data Mining for Security Applications. Lower, 2002.
  15. M. Hahsler and M. H. Dunham, "Temporal structure learning for clustering massive data streams in real-time," in SIAM Conference on Data Mining (SDM11). SIAM, April 2011, pp. 664–675.
  16. C. Isaksson, M. H. Dunham, and M. Hahsler, "Sostream: Self organizing density-based clustering over data stream," in Machine Learning and Data Mining in Pattern Recognition, ser. Lecture Notesin Computer Science. Springer Berlin Heidelberg, 2012, vol. 7376,pp. 264–278.
  17. "Neurocomputing: Foundations of research," J. A. Andersonand E. Rosenfeld, Eds. Cambridge, MA, USA: MIT Press, 1988, ch.Self-organized Formation of Topologically Correct Feature Maps, pp. 509–521.
  18. M. Hahsler, M. Bolanos, and J. Forrest, stream: Infrastructure for Data Stream Mining, 2015, R package version 1.2-2.
  19. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "MOA: massive online analysis," Journal of Machine Learning Research, vol. 99, pp. 1601–1604, August 2010.
  20. H. Kremer, P. Kranen, T. Jansen, T. Seidl, A. Bifet, G. Holmes, and B. Pfahringer, "An effective evaluation measure for clustering on evolving data streams," in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011, pp. 868–876.
  21. J. Gama, R. Sebasti˜ao, and P. P. Rodrigues, "On evaluating stream learning algorithms," Machine Learning, pp. 317–346, 2013.
  22. A. Bifet, G. de Francisci Morales, J. Read, G. Holmes, andB. Pfahringer, "Efficient online evaluation of big data stream classifiers," in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’15. ACM, 2015, pp. 59–68

Downloads

Published

2017-05-30

Issue

Section

Research Articles

How to Cite

[1]
Avula Chitty, " Efficiency of Clustering Data Streams Based on Micro-Clusters Shared Density , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 3, pp.943-950, May-June-2017.