Traffic-Aware Partition and Aggregation for Big Data Applications in Map-Reduce

Authors(3) :-Dinesh Kumar S., Siddique Ibrahim S. P. , Kirubakaran R

The Map Reduce programming model simpli?es large-scale data processing on commodity cluster by exploiting parallel map tasks and reduces tasks. Map Reduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster .Although many efforts have been made to improve the performance of Map Reduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic - efficient because network topology and data size associated with each key are not taken into consideration. The objective of this system is to reduce the network traffic cost for a map reduce job by designing a intermediate data partition scheme.

Authors and Affiliations

Dinesh Kumar S.
Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India
Siddique Ibrahim S. P.
Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India
Kirubakaran R
Department of Computer Science and Engineering, Kumarguru College of Technology, Coimbatore, TamilNadu, India

Map Reduce, Hadoop, Stragglers, Partition

  1. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
  2. W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality,” in INFOCOM, 2013 Proceedings IEEE. IEEE, 2013, pp. 1609-1617.
  3. F. Chen, M. Kodialam, and T. Lakshman, “Joint scheduling of processing and shuffle phases in mapreduce systems,” in INFOCOM, 2012 Proceedings IEEE. IEEE, 2012, pp. 1143-1151.
  4. Y. Wang, W. Wang, C. Ma, and D. Meng, “Zput: A speedy data uploading approach for the hadoop distributed file system,” in Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1-5.
  5. S. Chen and S. W. Schlosser, “Map-reduce meets wider varieties of applications,” Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05, 2008.
  6. J. Rosen, N. Polyzotis, V. Borkar, Y. Bu, M. J. Carey, M. Weimer, T. Condie, and R. Ramakrishnan, “Iterative mapreduce for large scale machine learning,” arXiv preprint arXiv:1303.3517, 2013.
  7. S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. S. Schreiber, “Presto: distributed machine learning and graph processing with sparse matrices,” in Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013, pp. 197- 210.

Publication Details

Published in : Volume 2 | Issue 3 | May-June 2017
Date of Publication : 2017-06-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 266-272
Manuscript Number : CSEIT1722394
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Dinesh Kumar S., Siddique Ibrahim S. P. , Kirubakaran R, "Traffic-Aware Partition and Aggregation for Big Data Applications in Map-Reduce", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 3, pp.266-272, May-June-2017.
Journal URL : http://ijsrcseit.com/CSEIT1722394

Article Preview

Follow Us

Contact Us