Orchestrating Dynamic Big Data End to End ETL Pipeline

Authors

  • Syed Azimuddin Inamdar  Department of Computer Science Engineering, VTU, SECAB Institute of Engineering and Technology, Vijayapura, Karnataka, India
  • Sayyid Abrar  Department of Computer Science Engineering, VTU, SECAB Institute of Engineering and Technology, Vijayapura, Karnataka, India
  • Gayatri Bajantri  Department of Computer Science Engineering, VTU, SECAB Institute of Engineering and Technology, Vijayapura, Karnataka, India

Keywords:

Bigdata, ETL pipeline.

Abstract

Now a days data is said to be the new currency and key to triumph. Gathering a rich quality information from numerous dispersed sources across the world necessitates abundant struggle and time. There stand quite a lot of other challenges that consists while transferring information from its start point to its end point. Data ETL pipelines are employed to extend the complete effectiveness of flow of data from its source to the final destination. In the meantime it is automated and decreases the involvement of humans. In spite of prevailing study on ETL pipelines, the study on this topic is limited. ETL pipelines are intellectual representations of end to end data pipelines. To make use of the full possible of the data pipeline, we need to recognize the events that are going in it and the way they're associated in an end to end pipeline. This thesis gives an summary of designing a conceptual model of data pipeline which may be further used as means of communication among various data teams.

References

  1. K. Goodhope, J. Koshy, J. Kreps, N. Narkhede, R. Park, J. Rao, and V. Y. Ye, “Building linkedin’s real-time activity data pipeline.” IEEE Data Eng. Bull., vol. 35, no. 2, pp. 33–45, 2012.
  2. E. Deelman and A. Chervenak, “Data management challenges of dataintensive scientific workflows,” in 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, 2008, pp. 687–692.
  3. P. Vassiliadis, “A survey of extract–transform–load technology,” International Journal of Data Warehousing and Mining (IJDWM), vol. 5, no. 3, pp. 1–27, 2009.
  4. J. Trujillo and S. Lujan-Mora, “A uml based approach for modeling ´ etl processes in data warehouses,” in International Conference on Conceptual Modeling. Springer, 2003, pp. 307– 320.
  5. Alkis Simitsis, Kevin Wilkinson, Umeshwar Dayal, Malu Castellanos HP Labs Palo Alto, CA, USA, Optimizing ETL Workflows for Fault-Tolerance, Conference: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA.

Downloads

Published

2021-08-26

Issue

Section

Research Articles

How to Cite

[1]
Syed Azimuddin Inamdar, Sayyid Abrar, Gayatri Bajantri, " Orchestrating Dynamic Big Data End to End ETL Pipeline" International Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 5, pp.47-53, September-October-2021.