Automated Data Preparation through Deep Learning: A Novel Framework for Intelligent Data Cleansing and Standardization

Authors

  • Praneeth Thoutam Fitbit, USA Author

DOI:

https://doi.org/10.32628/CSEIT241061231

Keywords:

Automated Data Preparation, Artificial Intelligence, Machine Learning, Data Quality Management, Intelligent Data Cleansing

Abstract

This article presents a comprehensive framework for automating data preparation and cleansing processes using artificial intelligence techniques. The proposed approach combines supervised and unsupervised learning methods with natural language processing to address common data quality challenges, including inconsistencies, missing values, and format standardization. By integrating deep neural networks for pattern recognition, ensemble methods for enhanced accuracy, and knowledge graphs for domain-specific expertise, the framework demonstrates significant improvements in both data quality and processing efficiency compared to traditional manual approaches. The system's architecture incorporates multiple layers of validation and quality assurance mechanisms, ensuring robust and reliable outputs while reducing human intervention in the data preparation pipeline. Experimental results across various datasets and use cases indicate substantial reductions in processing time and improved accuracy in anomaly detection and correction, while maintaining scalability for large-scale implementations. This article contributes to the growing field of automated data science by providing a scalable, intelligent solution that enables data scientists and analysts to focus on higher-value analytical tasks while ensuring consistent and high-quality data preparation.

Downloads

Download data is not yet available.

References

A. A. A. Fernandes, M. Koehler, N. Konstantinou, P. Pankin, N. W. Paton, and R. Sakellariou, "Data Preparation: A Technological Perspective and Review," SN Computer Science, vol. 4, no. 6, pp. 425-450, June 2023. [Online]. Available: https://link.springer.com/content/pdf/10.1007/s42979-023-01828-8.pdf

AccelData, "What Makes Manually Cleaning Data Challenging: Key Insights," [Online]. Available: https://www.acceldata.io/blog/what-makes-manually-cleaning-data-challenging-key-insights

R. Malhotra and P. Singh, "Recent Advances in Deep Learning Models: A Systematic Literature Review," Multimedia Tools and Applications, vol. 82, no. 4, pp. 44977-45060, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s11042-023-15295-z

K. Hiniduma, S. Byna, and J. L. Bez, "Data Readiness for AI: A 360-Degree Survey," arXiv, 2022. [Online]. Available: https://arxiv.org/pdf/2404.05779

V. Panwar, "AI-Powered Data Cleansing: Innovative Approaches for Ensuring Database Integrity and Accuracy," International Journal of Computer Trends and Technology, vol. 72, no. 4, pp. 116-122, 2024. [Online]. Available: https://ijcttjournal.org/archives/ijctt-v72i4p115

M. Fazzini, A. Orso, and S. Choudhary, "Automated Cross-Platform Inconsistency Detection for Mobile Apps," in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017. [Online]. Available: https://ieeexplore.ieee.org/document/8115644

F. Ouyang, T. A. Dinh, and W. Xu, "A Systematic Review of AI-Driven Educational Assessment in STEM Education," Journal for STEM Education Research, vol. 6, pp. 408-426, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s41979-023-00112-x

M. Ghahramani, Y. Qiao, M. C. Zhou, A. O'Hagan, and J. Sweeney, "AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes," IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 4, pp. 1026-1037, 2020. [Online]. Available: https://www.ieee-jas.net/en/article/doi/10.1109/JAS.2020.1003114

L. L. Pipino, Y. W. Lee, and R. Y. Wang, "Data Quality Assessment," Communications of the ACM, vol. 45, no. 4, pp. 211-218, 2002. [Online]. Available: https://dl.acm.org/doi/10.1145/505248.506010

Downloads

Published

18-12-2024

Issue

Section

Research Articles