Traffic-Aware Partition and Aggregation for Big Data Applications in Map-Reduce

Dinesh Kumar S.; Siddique Ibrahim S. P.; Kirubakaran R

doi:10.32628/CSEIT1722394

Authors

Dinesh Kumar S. Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India
Siddique Ibrahim S. P. Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India
Kirubakaran R Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India

Keywords:

Map Reduce, Hadoop, Stragglers, Partition

Abstract

The Map Reduce programming model simpli?es large-scale data processing on commodity cluster by exploiting parallel map tasks and reduces tasks. Map Reduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster .Although many efforts have been made to improve the performance of Map Reduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic - efficient because network topology and data size associated with each key are not taken into consideration. The objective of this system is to reduce the network traffic cost for a map reduce job by designing a intermediate data partition scheme.

References

J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality,” in INFOCOM, 2013 Proceedings IEEE. IEEE, 2013, pp. 1609-1617.
F. Chen, M. Kodialam, and T. Lakshman, “Joint scheduling of processing and shuffle phases in mapreduce systems,” in INFOCOM, 2012 Proceedings IEEE. IEEE, 2012, pp. 1143-1151.
Y. Wang, W. Wang, C. Ma, and D. Meng, “Zput: A speedy data uploading approach for the hadoop distributed file system,” in Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1-5.
S. Chen and S. W. Schlosser, “Map-reduce meets wider varieties of applications,” Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05, 2008.
J. Rosen, N. Polyzotis, V. Borkar, Y. Bu, M. J. Carey, M. Weimer, T. Condie, and R. Ramakrishnan, “Iterative mapreduce for large scale machine learning,” arXiv preprint arXiv:1303.3517, 2013.
S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. S. Schreiber, “Presto: distributed machine learning and graph processing with sparse matrices,” in Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013, pp. 197- 210.

Traffic-Aware Partition and Aggregation for Big Data Applications in Map-Reduce

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite