Enhanced Resource Efficiency for Association Rule Mining in Cloud Environments via Apache Spark

Md Sohrab Ansari; Prof. Vinod Mahor

doi:10.32628/CSEIT2390679

Authors

Md Sohrab Ansari Research Scholar, Department of Computer Science & Engineering, Millennium Institute of technology and Science, Bhopal, India
Prof. Vinod Mahor Assistant Professor, Department of Computer Science & Engineering, Millennium Institute of technology and Science, Bhopal, India

Keywords:

Association Rule Mining, Cloud System, FP, Big data.

Abstract

Data from various sources, including mobile devices, sensors, and web cams, constantly accumulates and is evaluated in Big Data. These processed data are crucial in various fields, such as research, business, and industry. Apache Spark is a versatile platform for processing both batch and real-time data. Cloud computing provides resources for real-time processing of applications. Association Rule Mining (ARM) is a technology that analyzes the link between objects to identify comparable groupings. FP-Growth is the most widely used algorithm for finding common patterns and locating mining pieces quickly. The aim of this research is to enhance the efficiency of Association Rule Mining by creating rules for big data sets in Big Data environments. The proposed solution enhances association rule efficiency by utilizing the FP-Growth algorithm in a Hadoop Map Reduce setting. FP-Growth is the most used method for discovering and mining frequent patterns. This research introduces the FP-Growth parallel method in Spark Framework. The efficient use of Spark resources through heterogeneous allocation reduces runtime and costs. Apache Spark is a versatile Big Data platform for real-time streaming and batch processing. Cloud computing is used in streaming applications to address real-time processing needs by supplying necessary resources. Using big data apps in a virtualized cloud environment may cause performance issues that impact streaming workloads. The ARM approach identifies highly associated models in item sets. FP expanding is the most common ARM algorithm. The FP-Growth algorithm is implemented in Spark using OpenStack. This article covers OpenStack architecture, needs, configuration, and problems. The analysis evaluates resource utility at full load and no load, and evaluates performance using virtual resource allocation. Using Spark resources efficiently reduces turnaround time and optimizes costs due to their diverse distribution.

References

Aditiya, R., Defit, S., & Nurcahyo, G. W. (2020). Prediksi Tingkat Ketersediaan Stock Sembako Menggunakan Algoritma FP-Growth dalam Meningkatkan Penjualan. Jurnal Informatika EkonomiBisnis, 67-73.
Aditya, C., Akash, M., Akash, P., Amitkumar, M., Nagarathna, K., Suraj, D., ... & Meena, S. M. (2020). Claims-Based VM Authorization on OpenStack Private Cloud using Blockchain. Procedia Computer Science, 171, 2205-2214.
Ahmed, N., Barczak, A. L., Susnjak, T., & Rashid, M. A. (2020). A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. Journal of Big Data, 7(1), 1-18.
Alnasir, J. J., & Shanahan, H. P. (2020). The application of hadoop in structural bioinformatics. Briefings in bioinformatics, 21(1), 96-105.
Alotaibi, S., Mehmood, R., Katib, I., Rana, O., & Albeshri, A. (2020). Sehaa: A big data analytics tool for healthcare symptoms and diseases detection using Twitter, Apache Spark, and Machine Learning. Applied Sciences, 10(4), 1398.
Anbarasan, M., Muthu, B., Sivaparthipan, C. B., Sundarasekar, R., Kadry, S., Krishnamoorthy, S., & Dasel, A. A. (2020). Detection of flood disaster system based on IoT, big data and convolutional deep neural network. Computer Communications, 150, 150-157.
Anilkumar, C., & Subramanian, S. (2020). A novel predicate based access control scheme for cloud environment using open stack swift storage. Peer-to-Peer Networking
Banchhor, C., & Srinivasu, N. (2020). FCNB: Fuzzy Correlative naive bayes classifier with mapreduce framework for big data classification. Journal of Intelligent Systems, 29(1), 994-1006.
Beňo, P., Schauer, F., Šprinková, S., Šimko, M., & Komenda, T. (2020). Road to Strengthen of Virtual Infrastructure and Security of Remote Laboratories on Trnava University in Trnava.
Chengyan, L. I., Feng, S., & Sun, G. (2020). DCE-miner: an association rule mining algorithm for multimedia based on the MapReduce framework. Multimedia Tools and Applications, 79, 16771-16793.
da Rosa Righi, R., Correa, E., Gomes, M. M., & da Costa, C. A. (2020). Enhancing performance of IoT applications with load prediction and cloud elasticity. Future Generation Computer Systems, 109, 689-701.
Daghistani, T., AlGhamdi, H., Alshammari, R., & AlHazme, R. H. (2020). Predictors of outpatients’ no-show: big data analytics using apache spark. Journal of Big Data, 7(1), 1-15.
Dolores, M., Fernandez-Basso, C., Gómez-Romero, J., & Martin-Bautista, M. J. (2023). A big data association rule mining based approach for energy building behaviour analysis in an IoT environment. Scientific Reports, 13(1), 19810.
Fernandez-Basso, C., Ruiz, M. D., & Martin-Bautista, M. J. (2023). New spark solutions for distributed frequent itemset and association rule mining algorithms. Cluster Computing, 1-18.
Gupta, Y. K. (2020). Aspect of Big Data in Medical Imaging to Extract the Hidden Information Using HIPI in HDFS Environment. In Advancement of Machine Intelligence in Interactive Medical Image Analysis (pp. 19-40). Springer, Singapore.
Gupta, Y. K., & Choudhary, S. (2020). A Study of Big Data Analytics with Two Fatal Diseases Using Apache Spark Framework. International Journal of Advanced Science and Technology (IJAST), 29(5), 2840-2851.
Haiyun, Z., & Yizhe, X. (2020). Sports performance prediction model based on integrated learning algorithm and cloud computing Hadoop platform. Microprocessors and Microsystems, 79, 103322.
Haji, L. M., Zeebaree, S., Ahmed, O. M., Sallow, A. B., Jacksi, K., & Zeabri, R. R. (2020). Dynamic resource allocation for distributed systems and cloud computing. TEST Engineering & Management, 83, 22417-22426.
Hosamani, N., Albur, N., Yaji, P., Mulla, M. M., & Narayan, D. G. (2020, July). Elastic provisioning of Hadoop clusters on OpenStack private cloud. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-7). IEEE.
Kazemi, A., Keshtkar, A., Rashidi, S., Aslanabadi, N., Khodadad, B., & Esmaeili, M. (2020). Segmentation of cardiac fats based on Gabor filters and relationship of adipose volume with coronary artery disease using FP-Growth algorithm in CT scans. Biomedical Physics & Engineering Express, 6(5), 055009.
Kelvin, K., Cindy, C., Charles, C., Leonardo, D. P., & Yennimar, Y. (2020). Customer Churn’s Analysis InTelecomunications Company Using Fp-Growth Algorithm: Customer Churn’s Analysis In Telecomunications Company Using Fp-Growth Algorithm. JurnalMantik, 4(2), 1285-1290.
Khader, M., & Al-Naymat, G. (2020). Density-based Algorithms for Big Data Clustering Using MapReduce Framework: A Comprehensive Study. ACM Computing Surveys (CSUR), 53(5), 1-38.
Koulouzis, S., Martin, P., & Zhao, Z. (2020). Virtual Infrastructure Optimisation. In Towards Interoperable Research Infrastructures for Environmental and Earth Sciences (pp. 192-207). Springer, Cham.
Koulouzis, S., Martin, P., Zhou, H., Hu, Y., Wang, J., Carval, T., ...& Zhao, Z. (2020). Time‐critical data management in clouds: Challenges and a Dynamic Real‐Time Infrastructure Planner (DRIP) solution. Concurrency and Computation: Practice and Experience, 32(16), e5269.
Kristiani, E., Yang, C. T., Huang, C. Y., Wang, Y. T., & Ko, P. C. (2020). The implementation of a cloud-edge computing architecture using OpenStack and Kubernetes for air quality monitoring application. Mobile Networks and Applications, 1-23.

Enhanced Resource Efficiency for Association Rule Mining in Cloud Environments via Apache Spark

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite