Hybrid Job-Driven Scheduling for Heterogeneous MapReduce Clusters

Authors

  • J. Sivarani  Department of Computer Science, Sri Padmavathi University, Tirupati, India
  • T. Subramanyam   Asst. Professor, Department of Computer Science Sri Padmavathi University, Tirupati, India

Keywords:

MapReduce, Hadoop, Map-task Scheduling, Reduce-task Scheduling, Heterogeneous virtual MapReduce clusters

Abstract

It is cost-efficient for a tenant with a limited budget to establish a heterogeneous virtual MapReduce clusters by renting various virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, and MapReduce still performs poorly on heterogeneous clusters, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant perspective. JoSS provide not only job level scheduling, but also Map-task level scheduling and Reduce-task level scheduling; The deployment of MapReduce in data canters and clouds present several challenges, improve data locality for both map-level task and reduce-level task, avoid job starvation and improve job execution performance. Two variations of JoSS-Task and JoSS-Job are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations (JoSS-T and JoSS-J) with current scheduling algorithms supported by Hadoop. The result shows that the two variations crush the opposite tested algorithms in terms of map and reduce data locality , and network overhead while not acquisition significant overhead. Additionally, the two variations area unit severally appropriate for various MapReduce-workload eventualities and supply the most effective job performance among all tested algorithms.

References

  1. Durga solutions by mapreduce " https://www.youtube.com/watchv=6oemzejdmp8"
  2. Hadoop, http://hadoop.apache.org (dec. 3, 2014)
  3. S. Chen and s. Schlosser, "map-reduce meets wider varieties of applications," technical report irp-tr-08-05, intel research, 2008.
  4. B. White, t. Yeh, j. Lin, and l. Davis, "web-scale computer vision using mapreduce for multimedia data mining," in proceedings of the tenth international workshop on multimedia data mining, pp. 1-10. Acm, july 2010.
  5. A. Matsunaga, m. Tsugawa, and j. Fortes, "cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications," in ieee fourth international conference on escience, pp. 222-229, december 2008.
  6. X-rime. Http://xrime.sourceforge.net/  (dec. 3, 2014)
  7. K. Wiley, a. Connolly, j. Gardner, s. Krughoff, m. Balazinska, b. Howe, y. Kwon, and y. Bu, "astronomy in the cloud: using mapreduce for image co-addition," astronomy, 123(901), pp. 366-380, 2011.
  8. Disco, http://discoproject.org (dec. 3, 2014)
  9. Gridgain, http://www.gridgain.com (dec. 3, 2014)
  10. David d. Clark, member, ieee, kenneth t. Pogran, member, ieee, and david p. Wed " an introduction to local area networks" https://groups.csail.mit.edu/ana/publications/pubpdfs/an%20introduction%20to%20local%20area%20networks.pdf
  11. Vidyullatha Pellakuri1 , Dr.D. Rajeswara Rao2" Hadoop Mapreduce Framework in Big Data Analytics " http://ijcttjournal.org/Volume8/number-3/IJCTT-V8P121.pdf
  12. Abdullah Almurayh" Virtual Private Server" in 2010 http://cs.uccs.edu/~cs526/studentproj/projS2010/aalmuray/doc/Almurayh_VPS.pdf
  13. " Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning" http://ieeexplore.ieee.org/document/7523426/
  14. " Optimal MapReduce Job Scheduling algorithm across Cloud Federation " http://csce.ucmss.com/books/LFS/CSREA2017/PDP3681.pdf
  15. Zhenhua Guo, Geoffrey Fox, Mo Zhou " Investigation of Data Locality in MapReduce " https://pdfs.semanticscholar.org/48b5/568d8cec22d167c88d10a4de01f48a4740d0.pdf
  16. " Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters" http://ieeexplore.ieee.org/document/7065298/
  17. Z. Guo, G. Fox, and M. Zhou, "Investigation of data locality in mapreduce," In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 419-426, May 2012.
  18. C. He, Y. Lu, and D. Swanson, "Matchmaking: A new mapreduce scheduling technique," In 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom 2011), pp. 40-47, November 2011.  [16] T
  19. T. White, "Hadoop: the definitive guide," O'Reilly Media, Yahoo! Press, June 5, 2009. [
  20. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling," In Proceedings of the 5th European conference on Computer systems, pp. 265-278. ACM, April 2010, http://dx.doi.org/10.1145/1755913.1755940 
  21. J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong, "BAR: an efficient data locality driven task scheduling algorithm for cloud computing," In 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), pp. 295-304, May 2011.
  22. "Hadoop MapReduce Scheduling Algorithms - A Survey" http://www.ijcsmc.com/docs/papers/December2015/V4I12201548.pdf
  23. Fair Scheduler Guide, http://archive.cloudera.com/cdh/3/hadoop0.20.2+737/fair_scheduler.html (Dec. 3, 2014)
  24. Capacity Scheduler Guide, http://archive.cloudera.com/cdh/3/hadoop0.20.2+737/capacity_scheduler.html (Dec. 3, 2014)
  25. "https://www.youtube.com/watchv=AcUauzCn7RE" youtube API extract data from youtube.

Downloads

Published

2017-10-31

Issue

Section

Research Articles

How to Cite

[1]
J. Sivarani, T. Subramanyam , " Hybrid Job-Driven Scheduling for Heterogeneous MapReduce Clusters, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.330-341, September-October-2017.