Clustering Social Networking Data With K-Means Algorithm Using R Language
DOI:
https://doi.org/10.32628/CSEIT24104105Keywords:
Social Networking, Clustering, R Programming, Cloudera, K-Means, Big DataAbstract
The main objectives of this research work are to report detailed empirical studies on sequential and parallel algorithms for diverse clustering tasks executed on very large social network datasets using memory efficient out-of-core approaches. We evaluate the spark implementation for R on Cloudera using the data from social media review datasets like k-means and hierarchical clustering to rank these algorithms. This implementation leverages the YouTube dataset from UCI Machine Learning Repository. Our goal is to compare a few algorithms, so we can know exactly how accurately these models are performing. Ultimately we want to deal with testing and ranking clustering method, and mining and finally clustering massive amounts of unstructured data.
Downloads
References
S.Vikram Phaneendra & E.Madhusudhan Reddy “Big Data- solutions for RDBMS problems- A survey” In 12th IEEE/IFIP Network Operations & Management Symposium (NOMS 2010) (Osaka, Japan, Apr 19, 2013).
Kiran kumara Reddi & Dnvsl Indira “Different Technique to Transfer Big Data: survey” IEEE Transactions on 52(8) (Aug.2013).
Vaithiyanathan, V., Rajeswari, K., Tajane, K., & Pitale, R.,”comparison of different classification techniques”, International Journal of Advances in Engineering & Technology, May 2013. ISSN: 2231-1963, 6(2), 764–768.
Albert Bifet “Mining Big Data In Real Time”, Informatics, 37 (2013) 15–20 DEC 2012.
Bernice Purcell “The emergence of “big data” technology and analytics” Journal of Technology Research 2013.
Sameer Agarwal, Barzan MozafariX, Aurojit Panda, Henry Milner, Samuel MaddenX, Ion Stoica “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data”, ACM , 978-1-4503-1994 2/13/04.
K. A. Abdul Nazeer & M. P. Sebastian” Improving the Accuracy and Efficiency of the K Means Clustering Algorithm” .Proceedings of the World Congress on Engineering, 2009 Vol I WCE 2009, London, U.K, July 1 – 3.
Niranjan Lal & Bhagyashree Pathak “mining of unstructured data with clustering approach”, International journal of engineering research & Management Technology, 2016.
Yingyi Bu _ Bill Howe _ Magdalena Balazinska _ Michael D. Ernst “The HaLoop Approach to Large-Scale Iterative Data Analysis”, VLDB, 2010 paper.
Osama Abu Abbas “Comparison between Data Clustering Algorithms”, the International Arab Journal of Information Technology, Volume 5, July 2008.
Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI, 2010. DOI: https://doi.org/10.1145/1629175.1629198
D. Napoleon & P. Ganga lakshmi, “An Efficient K-Means Clustering Algorithm for Reducing Time Complexity using Uniform Distribution Data Points”, IEEE, 2010. DOI: https://doi.org/10.1109/TISC.2010.5714605
Monika kalra, Niranjan lal, & samimul Qamar, (2017)“K-mean Clustering algorithm for data Mining of Heterogeneous Data”, International and Communication Technology for Sustainable Development , pp61-70. DOI: https://doi.org/10.1007/978-981-10-3920-1_7
Bao Rong Chang, Yun-Da Lee, and Po-Hao Liao “Development of Multiple Big Data Analytics Platforms with Rapid Response” Scientific Programming Volume 2017, Article ID 6972461, https://doi.org/10.1155/2017/6972461 DOI: https://doi.org/10.1155/2017/6972461
Simon Mulwa Kiio, Elisha O. Abade " Apache Spark based Big Data Analytics for Social Network Cybercrime Forensics " International Journal of Computer Applications (0975 – 8887) Volume 179 – No.8, December 2017.
Agnivesh, Rajiv Pandey, Amarjeet Singh" Enhancing K-means for Multidimensional Big Data Clustering using R on Cloud " International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-7, May 2019.
Kaur N., Lal N. (2018) Clustering of Social Networking Data Using SparkR in Big Data. In: Singh M., Gupta P., Tyagi V., Flusser J., Ören T. (eds) Advances in Computing and Data Sciences. ICACDS 2018. Communications in Computer and Information Science, vol 906. Springer, Singapore. https://doi.org/10.1007/978-981-13-1813-9_22 DOI: https://doi.org/10.1007/978-981-13-1813-9_22
N. Lal, M. Singh, S. Pandey and A. Solanki, "A Proposed Ranked Clustering Approach for Unstructured Data from Dataspace using VSM," 2020 20th International Conference on Computational Science and Its Applications (ICCSA), Cagliari, Italy, 2020, pp. 80-86, doi: 10.1109/ICCSA50381.2020.00024. DOI: https://doi.org/10.1109/ICCSA50381.2020.00024
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Computer Science, Engineering and Information Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.