Automated Data Preparation through Deep Learning: A Novel Framework for Intelligent Data Cleansing and Standardization
DOI:
https://doi.org/10.32628/CSEIT241061231Keywords:
Automated Data Preparation, Artificial Intelligence, Machine Learning, Data Quality Management, Intelligent Data CleansingAbstract
This article presents a comprehensive framework for automating data preparation and cleansing processes using artificial intelligence techniques. The proposed approach combines supervised and unsupervised learning methods with natural language processing to address common data quality challenges, including inconsistencies, missing values, and format standardization. By integrating deep neural networks for pattern recognition, ensemble methods for enhanced accuracy, and knowledge graphs for domain-specific expertise, the framework demonstrates significant improvements in both data quality and processing efficiency compared to traditional manual approaches. The system's architecture incorporates multiple layers of validation and quality assurance mechanisms, ensuring robust and reliable outputs while reducing human intervention in the data preparation pipeline. Experimental results across various datasets and use cases indicate substantial reductions in processing time and improved accuracy in anomaly detection and correction, while maintaining scalability for large-scale implementations. This article contributes to the growing field of automated data science by providing a scalable, intelligent solution that enables data scientists and analysts to focus on higher-value analytical tasks while ensuring consistent and high-quality data preparation.
Downloads
References
A. A. A. Fernandes, M. Koehler, N. Konstantinou, P. Pankin, N. W. Paton, and R. Sakellariou, "Data Preparation: A Technological Perspective and Review," SN Computer Science, vol. 4, no. 6, pp. 425-450, June 2023. [Online]. Available: https://link.springer.com/content/pdf/10.1007/s42979-023-01828-8.pdf
AccelData, "What Makes Manually Cleaning Data Challenging: Key Insights," [Online]. Available: https://www.acceldata.io/blog/what-makes-manually-cleaning-data-challenging-key-insights
R. Malhotra and P. Singh, "Recent Advances in Deep Learning Models: A Systematic Literature Review," Multimedia Tools and Applications, vol. 82, no. 4, pp. 44977-45060, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s11042-023-15295-z
K. Hiniduma, S. Byna, and J. L. Bez, "Data Readiness for AI: A 360-Degree Survey," arXiv, 2022. [Online]. Available: https://arxiv.org/pdf/2404.05779
V. Panwar, "AI-Powered Data Cleansing: Innovative Approaches for Ensuring Database Integrity and Accuracy," International Journal of Computer Trends and Technology, vol. 72, no. 4, pp. 116-122, 2024. [Online]. Available: https://ijcttjournal.org/archives/ijctt-v72i4p115
M. Fazzini, A. Orso, and S. Choudhary, "Automated Cross-Platform Inconsistency Detection for Mobile Apps," in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017. [Online]. Available: https://ieeexplore.ieee.org/document/8115644
F. Ouyang, T. A. Dinh, and W. Xu, "A Systematic Review of AI-Driven Educational Assessment in STEM Education," Journal for STEM Education Research, vol. 6, pp. 408-426, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s41979-023-00112-x
M. Ghahramani, Y. Qiao, M. C. Zhou, A. O'Hagan, and J. Sweeney, "AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes," IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 4, pp. 1026-1037, 2020. [Online]. Available: https://www.ieee-jas.net/en/article/doi/10.1109/JAS.2020.1003114
L. L. Pipino, Y. W. Lee, and R. Y. Wang, "Data Quality Assessment," Communications of the ACM, vol. 45, no. 4, pp. 211-218, 2002. [Online]. Available: https://dl.acm.org/doi/10.1145/505248.506010
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.