Clustering of large datasets using Hadoop Ecosystem

Mounica B; Aditya Srivastava; Md.Faisal Alam

doi:10.32628/CSEIT1722398

Authors

Mounica B Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India
Aditya Srivastava Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India
Md.Faisal Alam Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India

Keywords:

Hadoop, MapReduce, K-means.

Abstract

In today's rapid change of world along with the advancement of technology, the amount of data being generated and used is very high. The rate of data production is very rapid and is not easy to measure. The existing data processing techniques are not capable enough to process data which are so large. K-means is a traditional clustering method which is easy to implement but it converges to local minima from starting position and is sensitive to initial clusters. Hadoop or the Hadoop Distributed File System (HDFS) is a distributed file system which is highly fault tolerant and can be implemented on low cost hardware. It provides complete access to data for any operation and is suitable for applications that needs large data sets. Hadoop is used for parallel processing of large data set in less time.

References

The k-means clustering technique: General considerations and implementation in Mathematica, Laurence Morissette and Sylvain Chartier, Université d'Ottawa.
Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr, Naveen D Chandavarkar, PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India.
K-Means Clustering Tutorial, By Kardi Teknomo, Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kMean\.
Parallel Clustering of large data set on Hadoop using Data mining techniques, Kaustubh S. Chaturbhuj, Dept. of Computer Science and Engineering, YCCE Nagpur, India, , Mrs. Gauri Chaudhary, Dept. of Computer Science and Engineering, YCCE, Nagpur, India.
Apache documentation on Hadoop.

Clustering of large datasets using Hadoop Ecosystem

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite