Traffic-Aware Partition and Aggregation for Big Data Applications in Map-Reduce

Authors

  • Dinesh Kumar S.  Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India
  • Siddique Ibrahim S. P.   Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India
  • Kirubakaran R  Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India

Keywords:

Map Reduce, Hadoop, Stragglers, Partition

Abstract

The Map Reduce programming model simpli?es large-scale data processing on commodity cluster by exploiting parallel map tasks and reduces tasks. Map Reduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster .Although many efforts have been made to improve the performance of Map Reduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic - efficient because network topology and data size associated with each key are not taken into consideration. The objective of this system is to reduce the network traffic cost for a map reduce job by designing a intermediate data partition scheme.

References

  1. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
  2. W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality,” in INFOCOM, 2013 Proceedings IEEE. IEEE, 2013, pp. 1609-1617.
  3. F. Chen, M. Kodialam, and T. Lakshman, “Joint scheduling of processing and shuffle phases in mapreduce systems,” in INFOCOM, 2012 Proceedings IEEE. IEEE, 2012, pp. 1143-1151.
  4. Y. Wang, W. Wang, C. Ma, and D. Meng, “Zput: A speedy data uploading approach for the hadoop distributed file system,” in Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1-5.
  5. S. Chen and S. W. Schlosser, “Map-reduce meets wider varieties of applications,” Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05, 2008.
  6. J. Rosen, N. Polyzotis, V. Borkar, Y. Bu, M. J. Carey, M. Weimer, T. Condie, and R. Ramakrishnan, “Iterative mapreduce for large scale machine learning,” arXiv preprint arXiv:1303.3517, 2013.
  7. S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. S. Schreiber, “Presto: distributed machine learning and graph processing with sparse matrices,” in Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013, pp. 197- 210.

Downloads

Published

2017-06-30

Issue

Section

Research Articles

How to Cite

[1]
Dinesh Kumar S., Siddique Ibrahim S. P. , Kirubakaran R, " Traffic-Aware Partition and Aggregation for Big Data Applications in Map-Reduce, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 3, pp.266-272, May-June-2017.