Clustering of High Dimensional Data Streams by Implementing HPStream Method

Authors(1) :-C. Kondaiah

Clustering is an important task in mining evolving with data streams because of data streams produces the continuous and potentially unbounded sequential of data points [1].Such streams collecting the data from the different devices. However, naturally, streaming data is high-dimensional data [1]. High dimensional data streams are frequently very large and it may include outliers .Therefore such streaming data is an significance issue in data mining process. High-dimensional data is actually very difficult in classification, clustering and similarity search. Recently, DBSTREAM, single-scan, subspace methods are used for projected clusters over the high-dimensional data sets. These methods are difficult to generalize to high dimensional data streams because of the huge volume of data generated the automatically by simple transactions of day-to-day life. In this paper implemented a high-dimensional data streams clustering technique, known as HPStream. This technique consists of fade clustering structure and projected primarily based clustering. It is continuously updatable and it's accurate scalable on both the number of dimensions and quantity of the data streams, and it offers the better high-quality clusters as compare with the preceding records movement techniques.

Authors and Affiliations

C. Kondaiah
Department of Computer Science, JNTUA, Anantapur, Andhra Pradesh, India

DataStream, High Dimensional Data, Clustering.

  1. Sunita Jahirabadkar, Parag Kulkarni., "Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms",á International Journal of Computer Applications (0975 - 8887) Volume 63- No.20, February 2013.
  2. Michael Hahsler,á Matthew Bolaśnos., "Clustering Data Streams Based on Shared Density Between Micro-Clusters", IEEE Transactions On Knowledge And Data Engineering - Preprint, Accepted 1/17/2016.
  3. Lance Parsons, Ehtesham Haque, Huan Liu., "Subspace Clustering for High Dimensional Data: A Review", ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced data sets: Volume 6 Issue 1, June 2004.
  4. Chairukwattana R., Kangkachit T., Rakthanmanon T., Waiyamai K., "Evolution-Based Clustering of High Dimensional Data Streams with Dimension Projection", Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 245. Springer, 2014.
  5. Feng Cao, Martin Ester, Weining Qian, Aoying Zhou., "Density-Based Clustering over an Evolving Data Stream with Noise", SIAM International Conference on Data Mining,2006.
  6. Levent Ertoz, Michael Steinbach, Vipin Kumar., "A New Shared Nearest Neighbor Clustering Algorithm and its Applications"., Workshop on Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining,(2002)
  7. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering data streams," in Proc. ACM Symp. Found. Comput. Sci., 12-14 Nov. 2000, pp. 359-366.
  8. C. Aggarwal, Data Streams: Models and Algorithms, (series Advances in Database Systems). New York, NY, USA: Springer-Verlag, 2007.
  9. J. Gama, Knowledge Discovery from Data Streams, 1st Ed. London, U.K.: Chapman & Hall, 2010.
  10. Y. Chen and L. Tu, "Density-based clustering for real-time stream data," in Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2007, pp. 133-142.
  11. L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, "Density-based clustering of data streams at multiple resolutions," ACM Trans. Knowl. Discovery from Data, vol. 3, no. 3, pp. 1-28, 2009.
  12. Amineh Amini, Teh Ying Wah., "Density Micro-Clustering Algorithms on Data Streams: A Review", Proceedings of the international multiconference of Engineers and scientists 2011, vol 1, IMESC, March 16-18-2011, Hong Kong
  13. http://en.wikipedia.org/wiki/Eigenface.

Publication Details

Published in : Volume 2 | Issue 4 | July-August 2017
Date of Publication : 2017-08-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 524-529
Manuscript Number : CSEIT1724138
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

C. Kondaiah, "Clustering of High Dimensional Data Streams by Implementing HPStream Method ", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.524-529, July-August-2017.
Journal URL : http://ijsrcseit.com/CSEIT1724138

Article Preview