Partitional Based Clustering Algorithms on Big Data Using Apache Spark

Authors

  • N. Sukumar  M.Tech Scholar, Department of CSE, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India
  • Prof. A. Ananda Rao  Professor, Department of CSE, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India
  • Dr. P. Radhika Raju  Ad-hoc Assistant Professor, Department of CSE, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India

Keywords:

Clustered Technique, Iterative Computation, Apache Spark, Membership Matrix, PySpark

Abstract

Apache Spark, a framework similar to the Von Neumann architecture. It has an efficient implementation of in-memory computations and iterative optimization is processed to analyze large volume of data. Data captured at high velocity and from variety of different sources known as Big Data. Such big data can be partitioned and clustered, based on parameters of the data. The parameterized clusters are enhanced under clustering algorithms for better outcomes. In this paper the current approach optimize computation over random sampling algorithms, where empirical evidence exhibit the significant change in computation of partition algorithms. Computations can be carried out in iterative procedure for wide variety of datasets retaining an abstraction known as Resilient Distributed Datasets (RDDs).

References

  1. Neha Bharill, Aruna Tiwari and Aayushi, "Fuzzy Based Scalable Clustering Algorithms for Handling big data using Apache Spark". IEEE Transactions on 2016.
  2. R. Krishnapuram, A. Joshi, O. Nasraoui, and L. Yi, “Low complexity fuzzy relational clustering algorithms for web mining,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 595–607, 2001.
  3. T.C. Havens, J.C. Bezdek, C. Leckie, L.O. Hall and M. Palaniswami, "Fuzzy c-means algorithms for very large data," IEEE Transactions on Fuzzy Systems, pp.1130-1146.
  4. J.F. Kolen and T. Hutcheson, “Reducing the time complexity of the fuzzy c-means algorithm,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp. 263–267, 2002.
  5. Jain Fu, Junwei Sun, Kaiyuan Wang, "SPARK—A Big Data Processing Platform for Machine Learning" 2017. IEEE Conference.
  6. Swarndeep Saket J, Dr. Sharnil Pandya, "An Overview of Partitioning Algorithms in Clustering Techniques". IJARCET June, 2016.
  7. Sumin Hong, Woohyuk Choi, Won-Ki Jeong, "GPU in-memory processing using Spark for iterative computation", 2017 17th IEEE/ACM.
  8. "HaLoop: Efficient Iterative Data Processing on Large Clusters", Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, 2010.

Downloads

Published

2018-06-30

Issue

Section

Research Articles

How to Cite

[1]
N. Sukumar, Prof. A. Ananda Rao, Dr. P. Radhika Raju, " Partitional Based Clustering Algorithms on Big Data Using Apache Spark, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 5, pp.833-838, May-June-2018.