Clustering of large datasets using Hadoop Ecosystem

Authors

  • Mounica B  Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India
  • Aditya Srivastava  Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India
  • Md.Faisal Alam  Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India

Keywords:

Hadoop, MapReduce, K-means.

Abstract

In today's rapid change of world along with the advancement of technology, the amount of data being generated and used is very high. The rate of data production is very rapid and is not easy to measure. The existing data processing techniques are not capable enough to process data which are so large. K-means is a traditional clustering method which is easy to implement but it converges to local minima from starting position and is sensitive to initial clusters. Hadoop or the Hadoop Distributed File System (HDFS) is a distributed file system which is highly fault tolerant and can be implemented on low cost hardware. It provides complete access to data for any operation and is suitable for applications that needs large data sets. Hadoop is used for parallel processing of large data set in less time.

References

  1. The k-means clustering technique: General considerations and implementation in Mathematica, Laurence Morissette and Sylvain Chartier, Université d'Ottawa.
  2. Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr, Naveen D Chandavarkar, PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India.
  3. K-Means Clustering Tutorial, By Kardi Teknomo, Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kMean\.
  4. Parallel Clustering of large data set on Hadoop using Data mining techniques, Kaustubh S. Chaturbhuj, Dept. of Computer Science and Engineering, YCCE Nagpur, India, , Mrs. Gauri Chaudhary, Dept. of Computer Science and Engineering, YCCE, Nagpur, India.
  5. Apache documentation on Hadoop.

Downloads

Published

2017-06-30

Issue

Section

Research Articles

How to Cite

[1]
Mounica B, Aditya Srivastava, Md.Faisal Alam, " Clustering of large datasets using Hadoop Ecosystem, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 3, pp.127-131, May-June-2017.