Hadoop Periodic Jobs Using Data Blocks to Achieve Efficiency

Sujit Roy; Subrata Kumar Das; Indrani Mandal

doi:10.32628/CSEIT183320

Authors

Sujit Roy Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymensingh, Bangladesh
Subrata Kumar Das Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymensingh, Bangladesh
Indrani Mandal Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymensingh, Bangladesh

Keywords:

Hadoop, Map Reduce, Data efficiency, Data blocks, HDFS.

Abstract

To manage, process, and analyze very large datasets, HADOOP has been a powerful, fault-tolerant platform. HADOOP is used to access big data because it is effective, scalable and is well supported by large trafficker and user communities. This research paper proposed a new approach to process the data in HADOOP to achieve the efficiency of data processing by using synchronous data transmission, sending block of data from source to destination. Here a method has been shown how to divide the data blocks in achieving optimal efficacy by adjusting the split size or using appropriate size of staffs. As the effective HADOOP hardware configuration matches the requirements of each periodic task, so this allows our system to the data blocks increasing data efficiency as well as throughput. Finally, experiments showed the effectiveness of these methods with high data efficiency (around 22% more than existing system), low installation cost and the feasibility of this method.

References

J. Dean, and S. Ghemawat , " Map Reduce: Simplified Data Processing on Large Clusters," in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (OSDI 2004), San Francisco, CA, 2004, pp. 10-10.
J. Pan, Y. L. Biannic, and F. Magoulès, "Parallelizing Multiple Group-by Query in Share-Nothing Environment: a Map Reduce Study Case," in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp. 856-863.
S. Chen, and S. Schlosser, Map-Reduce Meets Wider Varieties of Applications, Intel, 2005.
S. Leo, and G. Zanetti, "Pydoop: a Python Map Reduce and HDFS API for HADOOP," in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp. 819-825.
D. Huang, X. Shi, S. Ibrahim et al., "MR-Scope: a Real-Time Tracing Tool for MapReduce," in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp. 849-855.
Praveen Kumar, Dr Vijay Singh Rathore, "Efficient Capabilities of Processing of Big Data using Hadoop Map Reduce," vol. 3, issue 6, 2014.
Jun Liu, Feng Liu, N.Ansari, "Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop," Network, vol. 28, issue 4, 2014.
Cheng Chen, Zhong Liu, Wei-Hua Lin, Shuang Li, Kai Wang, "Distributed Modeling in a MapReduce Framework for Data-Driven Traffic Flow Forecasting," Intelligent Transportation Systems, vol. 14, issue 1, 2013.
Jacob Leverich and Christos Kozyrakis. On the energy (in)efficiency of hadoopclusters.SIGOPSOper. Syst. Rev., 44:61–65, March 2010.
Douglas Thain, Todd Tannenbaum, and MironLivny. Distributed computing in practice: The Condor experience. Concurrency and Computation: Practice and Experience, 2004.

Hadoop Periodic Jobs Using Data Blocks to Achieve Efficiency

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite