Automating Data Pipelines with AI for Scalable, Real-Time Process Optimization in the Cloud
DOI:
https://doi.org/10.32628/CSEIT242612405Keywords:
Data Pipeline Automation, Artificial Intelligence, Cloud Computing, Real-time Processing, Stream Analytics, Machine Learning, ETL OptimizationAbstract
Modern data processing environments demand efficient, scalable solutions for handling massive data streams in real-time, yet traditional Extract, Transform, Load (ETL) pipelines face significant limitations in processing speed and adaptability. This article presents an AI-Enhanced Cloud Data Pipeline (AECDP) framework that combines Deep Learning-based Stream Processing (DLSP) with Adaptive Resource Management (ARM) for real-time data optimization. The framework introduces novel algorithms for stream processing, resource allocation, and quality assurance, including the Adaptive Stream Processing Algorithm (ASPA) and Anomaly Detection and Correction (ADC) system. The implementation utilizes a multi-cloud architecture with containerized microservices, enabling independent scaling and maintenance of pipeline components. Experimental results demonstrate the framework's effectiveness across various industry applications, including e-commerce, financial services, and manufacturing sectors. The system achieves consistent sub-second latency for real-time processing, linear throughput scaling, and optimal resource utilization across cloud instances. Additionally, the framework incorporates advanced security features and automated quality monitoring systems, ensuring robust and reliable data processing. The AECDP framework represents a significant advancement in data pipeline automation, providing organizations with a comprehensive solution for managing complex data processing requirements while maintaining high performance and reliability standards.
Downloads
References
Kekevi, Uğur & Aydin, Ahmet. (2022). Real-Time Big Data Processing and Analytics: Concepts, Technologies, and Domains. Computer Science. 7. 111-123. 10.53070/bbd.1204112. [Online] Available: http://dx.doi.org/10.53070/bbd.1204112
Qasim, Nameer & Bodnar, Natalia & Salman, Hayder & Mustafa, Salama & Rahim, Fakher. (2024). Data Management Challenges and Solutions in Cloud-Based Environments.. Radioelectronics. Nanosystems. Information Technologies.. 16. 157-170. 10.17725/j.rensit.2023.16.157. [Online] Available: http://dx.doi.org/10.17725/j.rensit.2023.16.157
Michael Leppitsch, Ascend.io "What Are Intelligent Data Pipelines?" Journal of Big Data, 10(2), 45-62. [Online] Available: https://www.ascend.io/blog/what-are-intelligent-data-pipelines/
Liu, Yuan & Shi, Xuanhua & Jin, Hai. (2015). Runtime-aware adaptive scheduling in stream processing. Concurrency and Computation: Practice and Experience. 28. n/a-n/a. 10.1002/cpe.3661. http://dx.doi.org/10.1002/cpe.3661
Hassan, H. A., Maiyza, A. I., & Sheta, W. M. (2020). Integrated resource management pipeline for dynamic resource-effective cloud data center. Journal of Cloud Computing, 9(1), 1-20. https://doi.org/10.1186/s13677-020-00212-8
Garofalakis, Minos & Gehrke, Johannes & Rastogi, Rajeev. (2016). Data Stream Management: Processing High-Speed Data Streams. 10.1007/978-3-540-28608-0. [Online] Available: http://dx.doi.org/10.1007/978-3-540-28608-0
Anush kumar Thati "Intelligent Enterprise Integration: An Ai Framework For Dynamic Data Transformation And Process Optimization " IEEE Transactions on Cloud Computing, 12(3), 789-801. https://doi.org/10.1109/TCC.2023.3289654
Polamarasetti, Anand. "Optimizing Cloud-Based Data Pipelines with Machine Learning and AI." Revista de Inteligencia Artificial en Medicina 13.1 (2022): 329-363. [Online] Available: http://redcrevistas.com/index.php/Revista/article/view/123
Patrik Braborec "How To Build a Modern Data Pipeline" Medium. [Online] Available: https://medium.com/gooddata-developers/how-to-build-a-modern-data-pipeline-cfdd9d14fbea
Configr Technologies "AI Model Deployment and Monitoring" IEEE Transactions on Software Engineering, 49(6), 1123-1138. https://doi.org/10.1109/TSE.2023.3265789
Suryadevera, M., Sandeep Rangineni, and Srinivas Venkata. "Optimizing Efficiency and Performance: Investigating Data Pipelines for Artificial Intelligence Model Development and Practical Applications." International Journal of Science and Research 12.7 (2023): 1330-1340. [Online] Available: https://www.academia.edu/download/104889293/SR23719211528.pdf
Deekshith, Alladi. "Integrating AI and Data Engineering: Building Robust Pipelines for Real-Time Data Analytics." International Journal of Sustainable Development in Computing Science 1.3 (2019): 1-35. [Online] Available: https://www.ijsdcs.com/index.php/ijsdcs/article/view/583
Steidl, M., Felderer, M., & Ramler, R. (2023). The pipeline for the continuous development of artificial intelligence models—Current state of research and practice. Journal of Systems and Software, 199, 111615. https://doi.org/10.1016/j.jss.2023.111615
Ghogare, Anupkumar. (2024). Next-Generation Data Pipeline Designs for Modern Analytics : A Comprehensive Review. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. 10. 548-554. 10.32628/CSEIT24106196. [Online] Available: http://dx.doi.org/10.32628/CSEIT24106196
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.