Partitional Based Clustering Algorithms on Big Data Using Apache Spark

Authors(3) :-N. Sukumar, Prof. A. Ananda Rao, Dr. P. Radhika Raju

Apache Spark, a framework similar to the Von Neumann architecture. It has an efficient implementation of in-memory computations and iterative optimization is processed to analyze large volume of data. Data captured at high velocity and from variety of different sources known as Big Data. Such big data can be partitioned and clustered, based on parameters of the data. The parameterized clusters are enhanced under clustering algorithms for better outcomes. In this paper the current approach optimize computation over random sampling algorithms, where empirical evidence exhibit the significant change in computation of partition algorithms. Computations can be carried out in iterative procedure for wide variety of datasets retaining an abstraction known as Resilient Distributed Datasets (RDDs).

Authors and Affiliations

N. Sukumar
M.Tech Scholar, Department of CSE, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India
Prof. A. Ananda Rao
Professor, Department of CSE, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India
Dr. P. Radhika Raju
Ad-hoc Assistant Professor, Department of CSE, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India

Clustered Technique, Iterative Computation, Apache Spark, Membership Matrix, PySpark

  1. Neha Bharill, Aruna Tiwari and Aayushi, "Fuzzy Based Scalable Clustering Algorithms for Handling big data using Apache Spark". IEEE Transactions on 2016.
  2. R. Krishnapuram, A. Joshi, O. Nasraoui, and L. Yi, “Low complexity fuzzy relational clustering algorithms for web mining,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 595–607, 2001.
  3. T.C. Havens, J.C. Bezdek, C. Leckie, L.O. Hall and M. Palaniswami, "Fuzzy c-means algorithms for very large data," IEEE Transactions on Fuzzy Systems, pp.1130-1146.
  4. J.F. Kolen and T. Hutcheson, “Reducing the time complexity of the fuzzy c-means algorithm,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp. 263–267, 2002.
  5. Jain Fu, Junwei Sun, Kaiyuan Wang, "SPARK—A Big Data Processing Platform for Machine Learning" 2017. IEEE Conference.
  6. Swarndeep Saket J, Dr. Sharnil Pandya, "An Overview of Partitioning Algorithms in Clustering Techniques". IJARCET June, 2016.
  7. Sumin Hong, Woohyuk Choi, Won-Ki Jeong, "GPU in-memory processing using Spark for iterative computation", 2017 17th IEEE/ACM.
  8. "HaLoop: Efficient Iterative Data Processing on Large Clusters", Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, 2010.

Publication Details

Published in : Volume 3 | Issue 5 | May-June 2018
Date of Publication : 2018-06-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 833-838
Manuscript Number : CSEIT1835189
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

N. Sukumar, Prof. A. Ananda Rao, Dr. P. Radhika Raju, "Partitional Based Clustering Algorithms on Big Data Using Apache Spark", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 5, pp.833-838, May-June-2018.
Journal URL : http://ijsrcseit.com/CSEIT1835189

Follow Us

Contact Us