Partitional Based Clustering Algorithms on Big Data Using Apache Spark
Keywords:
Clustered Technique, Iterative Computation, Apache Spark, Membership Matrix, PySparkAbstract
Apache Spark, a framework similar to the Von Neumann architecture. It has an efficient implementation of in-memory computations and iterative optimization is processed to analyze large volume of data. Data captured at high velocity and from variety of different sources known as Big Data. Such big data can be partitioned and clustered, based on parameters of the data. The parameterized clusters are enhanced under clustering algorithms for better outcomes. In this paper the current approach optimize computation over random sampling algorithms, where empirical evidence exhibit the significant change in computation of partition algorithms. Computations can be carried out in iterative procedure for wide variety of datasets retaining an abstraction known as Resilient Distributed Datasets (RDDs).
References
- Neha Bharill, Aruna Tiwari and Aayushi, "Fuzzy Based Scalable Clustering Algorithms for Handling big data using Apache Spark". IEEE Transactions on 2016.
- R. Krishnapuram, A. Joshi, O. Nasraoui, and L. Yi, “Low complexity fuzzy relational clustering algorithms for web mining,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 595–607, 2001.
- T.C. Havens, J.C. Bezdek, C. Leckie, L.O. Hall and M. Palaniswami, "Fuzzy c-means algorithms for very large data," IEEE Transactions on Fuzzy Systems, pp.1130-1146.
- J.F. Kolen and T. Hutcheson, “Reducing the time complexity of the fuzzy c-means algorithm,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp. 263–267, 2002.
- Jain Fu, Junwei Sun, Kaiyuan Wang, "SPARK—A Big Data Processing Platform for Machine Learning" 2017. IEEE Conference.
- Swarndeep Saket J, Dr. Sharnil Pandya, "An Overview of Partitioning Algorithms in Clustering Techniques". IJARCET June, 2016.
- Sumin Hong, Woohyuk Choi, Won-Ki Jeong, "GPU in-memory processing using Spark for iterative computation", 2017 17th IEEE/ACM.
- "HaLoop: Efficient Iterative Data Processing on Large Clusters", Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, 2010.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.