An in-Depth Review of Big Data Analytic Models for Clustering Operations
Keywords:
Big Data Analytics, Clustering Operations, Data Complexity, Analytic Models, Empirical Evaluation ProcessAbstract
The ever-increasing volume, pace, and variety of data in the present environment of data-driven decision-making need creative ways for extracting insightful information sets. A key component of data analysis, clustering techniques are essential for identifying latent patterns and structures in huge datasets. This work conducts a thorough investigation of big data analytic models for clustering operations in response to the pressing requirement to harness the power of big data analytics for effective and accurate clustering process. The necessity for this effort derives from the expanding levels of data volume and complexity that characterise modern information ecosystems. While effective in smaller datasets, conventional clustering approaches fall short when faced with the enormous datasets typical in contemporary applications. As a result, choosing and using the right big data analytic models for clustering have become crucial tasks for both researchers and practitioners. The review procedure used here is defined by a thorough and comprehensive approach. The first stage includes a thorough literature review in which a wide range of big data analytical models are methodically developed. These models cover a broad range of strategies, from hierarchical and model-based approaches to density-based and partitioning techniques. The foundation for the future research is laid by this thorough assessment, which focuses on a detailed analysis of the characteristics and performance measures of each model process. The empirical assessment considers a wide range of factors, including accuracy, computational complexity, scalability, and applicability for various application areas. A comprehensive knowledge of each model's potential and constraints is revealed by closely examining each model's performance across these aspects. This not only encourages a thorough understanding of the models' capabilities but also equips practitioners with the knowledge they need to carefully choose the best model for their unique clustering jobs.
References
- M. S. Mahmud, J. Z. Huang, R. Ruby, A. Ngueilbaye and K. Wu, "Approximate Clustering Ensemble Method for Big Data," in IEEE Transactions on Big Data, vol. 9, no. 4, pp. 1142-1155, 1 Aug. 2023, doi: 10.1109/TBDATA.2023.3255003.
- Z. Hu and D. Li, "Improved heuristic job scheduling method to enhance throughput for big data analytics," in Tsinghua Science and Technology, vol. 27, no. 2, pp. 344-357, April 2022, doi: 10.26599/TST.2020.9010047.
- Q. Zhang, L. T. Yang, Z. Chen and P. Li, "PPHOPCM: Privacy-Preserving High-Order Possibilistic c-Means Algorithm for Big Data Clustering with Cloud Computing," in IEEE Transactions on Big Data, vol. 8, no. 1, pp. 25-34, 1 Feb. 2022, doi: 10.1109/TBDATA.2017.2701816.
- M. A. Mahdi, K. M. Hosny and I. Elhenawy, "Scalable Clustering Algorithms for Big Data: A Review," in IEEE Access, vol. 9, pp. 80015-80027, 2021, doi: 10.1109/ACCESS.2021.3084057.
- A. K. Sandhu, "Big data with cloud computing: Discussions and challenges," in Big Data Mining and Analytics, vol. 5, no. 1, pp. 32-40, March 2022, doi: 10.26599/BDMA.2021.9020016.
- A. K. Tripathi, K. Sharma, M. Bala, A. Kumar, V. G. Menon and A. K. Bashir, "A Parallel Military-Dog-Based Algorithm for Clustering Big Data in Cognitive Industrial Internet of Things," in IEEE Transactions on Industrial Informatics, vol. 17, no. 3, pp. 2134-2142, March 2021, doi: 10.1109/TII.2020.2995680.
- D. Li, S. Wang, N. Gao, Q. He and Y. Yang, "Cutting the Unnecessary Long Tail: Cost-Effective Big Data Clustering in the Cloud," in IEEE Transactions on Cloud Computing, vol. 10, no. 1, pp. 292-303, 1 Jan.-March 2022, doi: 10.1109/TCC.2019.2947678.
- M. M. Madbouly, S. M. Darwish, N. A. Bagi and M. A. Osman, "Clustering Big Data Based on Distributed Fuzzy K-Medoids: An Application to Geospatial Informatics," in IEEE Access, vol. 10, pp. 20926-20936, 2022, doi: 10.1109/ACCESS.2022.3149548.
- Y. Zhao et al., "Tensor Train-Based Multiple Clusterings for Big Data in Cyber-Physical-Social Systems and Its Efficient Implementations," in IEEE Transactions on Network Science and Engineering, vol. 9, no. 6, pp. 3896-3908, 1 Nov.-Dec. 2022, doi: 10.1109/TNSE.2021.3119324.
- P. Jha, A. Tiwari, N. Bharill, M. Ratnaparkhe, M. Mounika and N. Nagendra, "A Novel Scalable Kernelized Fuzzy Clustering Algorithms Based on In-Memory Computation for Handling Big Data," in IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 6, pp. 908-919, Dec. 2021, doi: 10.1109/TETCI.2020.3016302.
- D. Yan, Y. Wang, J. Wang, G. Wu and H. Wang, "Fast Communication-Efficient Spectral Clustering over Distributed Data," in IEEE Transactions on Big Data, vol. 7, no. 1, pp. 158-168, 1 March 2021, doi: 10.1109/TBDATA.2019.2907985.
- X. He, T. Yu, Y. Shen and S. Wang, "Traffic Processing Model of Big Data Base Station Based on Hybrid Improved CNN Algorithm and K-Centroids Clustering Algorithm," in IEEE Access, vol. 11, pp. 63057-63068, 2023, doi: 10.1109/ACCESS.2023.3286860.
- N. Alemazkoor, M. Tootkaboni, R. Nateghi and A. Louhghalam, "Smart-Meter Big Data for Load Forecasting: An Alternative Approach to Clustering," in IEEE Access, vol. 10, pp. 8377-8387, 2022, doi: 10.1109/ACCESS.2022.3142680.
- X. Sun, Y. He, D. Wu and J. Z. Huang, "Survey of Distributed Computing Frameworks for Supporting Big Data Analysis," in Big Data Mining and Analytics, vol. 6, no. 2, pp. 154-169, June 2023, doi: 10.26599/BDMA.2022.9020014.
- H. Jin, X. Dai, J. Xiao, B. Li, H. Li and Y. Zhang, "Cross-Cluster Federated Learning and Blockchain for Internet of Medical Things," in IEEE Internet of Things Journal, vol. 8, no. 21, pp. 15776-15784, 1 Nov.1, 2021, doi: 10.1109/JIOT.2021.3081578.
- Y. Zhang et al., "EGraph: Efficient Concurrent GPU-Based Dynamic Graph Processing," in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 6, pp. 5823-5836, 1 June 2023, doi: 10.1109/TKDE.2022.3171588.
- X. Hu, Y. Li, L. Jia and M. Qiu, "A Novel Two-Stage Unsupervised Fault Recognition Framework Combining Feature Extraction and Fuzzy Clustering for Collaborative AIoT," in IEEE Transactions on Industrial Informatics, vol. 18, no. 2, pp. 1291-1300, Feb. 2022, doi: 10.1109/TII.2021.3076077.
- L. Zheng et al., "Efficient Graph Processing with Invalid Update Filtration," in IEEE Transactions on Big Data, vol. 7, no. 3, pp. 590-602, 1 July 2021, doi: 10.1109/TBDATA.2019.2921358.
- Q. -T. Bui et al., "SFCM: A Fuzzy Clustering Algorithm of Extracting the Shape Information of Data," in IEEE Transactions on Fuzzy Systems, vol. 29, no. 1, pp. 75-89, Jan. 2021, doi: 10.1109/TFUZZ.2020.3014662.
- M. Babar, M. A. Jan, X. He, M. U. Tariq, S. Mastorakis and R. Alturki, "An Optimized IoT-Enabled Big Data Analytics Architecture for Edge–Cloud Computing," in IEEE Internet of Things Journal, vol. 10, no. 5, pp. 3995-4005, 1 March1, 2023, doi: 10.1109/JIOT.2022.3157552.
- B. Si et al., "GGraph: An Efficient Structure-Aware Approach for Iterative Graph Processing," in IEEE Transactions on Big Data, vol. 8, no. 5, pp. 1182-1194, 1 Oct. 2022, doi: 10.1109/TBDATA.2020.3019641.
- Y. Chen et al., "KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 6, pp. 3939-3953, June 2021, doi: 10.1109/TSMC.2019.2956527.
- W. Li et al., "Metagraph-Based Life Pattern Clustering With Big Human Mobility Data," in IEEE Transactions on Big Data, vol. 9, no. 1, pp. 227-240, 1 Feb. 2023, doi: 10.1109/TBDATA.2022.3155752.
- C. Zhang, Y. Yang, W. Zhou and S. Zhang, "Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering," in IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 8, pp. 3701-3713, 1 Aug. 2022, doi: 10.1109/TKDE.2020.3029582.
- C. Fahy and S. Yang, "Finding and Tracking Multi-Density Clusters in Online Dynamic Data Streams," in IEEE Transactions on Big Data, vol. 8, no. 1, pp. 178-192, 1 Feb. 2022, doi: 10.1109/TBDATA.2019.2922969.
- Shivadekar, S., Kataria, B., Limkar, S. et al. Design of an efficient multimodal engine for preemption and post-treatment recommendations for skin diseases via a deep learning-based hybrid bioinspired process. Soft Comput (2023).
- Shivadekar, Samit, et al. "Deep Learning Based Image Classification of Lungs Radiography for Detecting COVID-19 using a Deep CNN and ResNet 50." International Journal of Intelligent Systems and Applications in Engineering 11.1s (2023): 241-250.
- P. Nguyen, S. Shivadekar, S. S. Laya Chukkapalli and M. Halem, "Satellite Data Fusion of Multiple Observed XCO2 using Compressive Sensing and Deep Learning," IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 2020, pp. 2073-2076, doi: 10.1109/IGARSS39084.2020.9323861.
- Banait, Satish S., et al. "Reinforcement mSVM: An Efficient Clustering and Classification Approach using reinforcement and supervised Techniques." International Journal of Intelligent Systems and Applications in Engineering 10.1s (2022): 78-89.
- Shewale, Yogita, Shailesh Kumar, and Satish Banait. "Machine Learning Based Intrusion Detection in IoT Network Using MLP and LSTM." International Journal of Intelligent Systems and Applications in Engineering 11.7s (2023): 210-223.
- Vanjari, Hrishikesh B., Sheetal U. Bhandari, and Mahesh T. Kolte. "Enhancement of Speech for Hearing Aid Applications Integrating Adaptive Compressive Sensing with Noise Estimation Based Adaptive Gain." International Journal of Intelligent Systems and Applications in Engineering 11.7s (2023): 138-157.
- Vanjari, Hrishikesh B., and Mahesh T. Kolte. "Comparative Analysis of Speech Enhancement Techniques in Perceptive of Hearing Aid Design." Proceedings of the Third International Conference on Information Management and Machine Intelligence: ICIMMI 2021. Singapore: Springer Nature Singapore, 2022.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.