Hadoop Periodic Jobs Using Data Blocks to Achieve Efficiency

Authors

  • Sujit Roy  Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymensingh, Bangladesh
  • Subrata Kumar Das  Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymensingh, Bangladesh
  • Indrani Mandal  Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymensingh, Bangladesh

Keywords:

Hadoop, Map Reduce, Data efficiency, Data blocks, HDFS.

Abstract

To manage, process, and analyze very large datasets, HADOOP has been a powerful, fault-tolerant platform. HADOOP is used to access big data because it is effective, scalable and is well supported by large trafficker and user communities. This research paper proposed a new approach to process the data in HADOOP to achieve the efficiency of data processing by using synchronous data transmission, sending block of data from source to destination. Here a method has been shown how to divide the data blocks in achieving optimal efficacy by adjusting the split size or using appropriate size of staffs. As the effective HADOOP hardware configuration matches the requirements of each periodic task, so this allows our system to the data blocks increasing data efficiency as well as throughput. Finally, experiments showed the effectiveness of these methods with high data efficiency (around 22% more than existing system), low installation cost and the feasibility of this method.

References

  1. J. Dean, and S. Ghemawat , " Map Reduce: Simplified Data Processing on Large Clusters," in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (OSDI 2004), San Francisco, CA, 2004, pp. 10-10.
  2. J. Pan, Y. L. Biannic, and F. Magoulès, "Parallelizing Multiple Group-by Query in Share-Nothing Environment: a Map Reduce Study Case," in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp. 856-863.
  3. S. Chen, and S. Schlosser, Map-Reduce Meets Wider Varieties of Applications, Intel, 2005.
  4. S. Leo, and G. Zanetti, "Pydoop: a Python Map Reduce and HDFS API for HADOOP," in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp. 819-825.
  5. D. Huang, X. Shi, S. Ibrahim et al., "MR-Scope: a Real-Time Tracing Tool for MapReduce," in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp. 849-855.
  6. Praveen Kumar, Dr Vijay Singh Rathore, "Efficient Capabilities of Processing of Big Data using Hadoop Map Reduce," vol. 3, issue 6, 2014.
  7. Jun Liu, Feng Liu, N.Ansari, "Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop," Network, vol. 28, issue 4, 2014.
  8. Cheng Chen, Zhong Liu, Wei-Hua Lin, Shuang Li, Kai Wang, "Distributed Modeling in a MapReduce Framework for Data-Driven Traffic Flow Forecasting," Intelligent Transportation Systems, vol. 14, issue 1, 2013.
  9. Jacob Leverich and Christos Kozyrakis. On the energy (in)efficiency of hadoopclusters.SIGOPSOper. Syst. Rev., 44:61–65, March 2010.
  10. Douglas Thain, Todd Tannenbaum, and MironLivny. Distributed computing in practice: The Condor experience. Concurrency and Computation: Practice and Experience, 2004.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Sujit Roy, Subrata Kumar Das, Indrani Mandal, " Hadoop Periodic Jobs Using Data Blocks to Achieve Efficiency, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 3, pp.122-127, March-April-2018.