Clustering of large datasets using Hadoop Ecosystem

Authors(3) :-Mounica B, Aditya Srivastava, Md.Faisal Alam

In today's rapid change of world along with the advancement of technology, the amount of data being generated and used is very high. The rate of data production is very rapid and is not easy to measure. The existing data processing techniques are not capable enough to process data which are so large. K-means is a traditional clustering method which is easy to implement but it converges to local minima from starting position and is sensitive to initial clusters. Hadoop or the Hadoop Distributed File System (HDFS) is a distributed file system which is highly fault tolerant and can be implemented on low cost hardware. It provides complete access to data for any operation and is suitable for applications that needs large data sets. Hadoop is used for parallel processing of large data set in less time.

Authors and Affiliations

Mounica B
Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India
Aditya Srivastava
Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India
Md.Faisal Alam
Department of Information Science, New Horizon College of Engineering, Bangalore, Karnataka, India

Hadoop, MapReduce, K-means.

  1. The k-means clustering technique: General considerations and implementation in Mathematica, Laurence Morissette and Sylvain Chartier, Université d'Ottawa.
  2. Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr, Naveen D Chandavarkar, PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India.
  3. K-Means Clustering Tutorial, By Kardi Teknomo, Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kMean\.
  4. Parallel Clustering of large data set on Hadoop using Data mining techniques, Kaustubh S. Chaturbhuj, Dept. of Computer Science and Engineering, YCCE Nagpur, India, , Mrs. Gauri Chaudhary, Dept. of Computer Science and Engineering, YCCE, Nagpur, India.
  5. Apache documentation on Hadoop.

Publication Details

Published in : Volume 2 | Issue 3 | May-June 2017
Date of Publication : 2017-06-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 127-131
Manuscript Number : CSEIT1722398
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Mounica B, Aditya Srivastava, Md.Faisal Alam, "Clustering of large datasets using Hadoop Ecosystem", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 3, pp.127-131, May-June.2017
URL : http://ijsrcseit.com/CSEIT1722398

Follow Us

Contact Us