Hybrid Job-Driven Scheduling for Heterogeneous MapReduce Clusters
Keywords:
MapReduce, Hadoop, Map-task Scheduling, Reduce-task Scheduling, Heterogeneous virtual MapReduce clustersAbstract
It is cost-efficient for a tenant with a limited budget to establish a heterogeneous virtual MapReduce clusters by renting various virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, and MapReduce still performs poorly on heterogeneous clusters, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant perspective. JoSS provide not only job level scheduling, but also Map-task level scheduling and Reduce-task level scheduling; The deployment of MapReduce in data canters and clouds present several challenges, improve data locality for both map-level task and reduce-level task, avoid job starvation and improve job execution performance. Two variations of JoSS-Task and JoSS-Job are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations (JoSS-T and JoSS-J) with current scheduling algorithms supported by Hadoop. The result shows that the two variations crush the opposite tested algorithms in terms of map and reduce data locality , and network overhead while not acquisition significant overhead. Additionally, the two variations area unit severally appropriate for various MapReduce-workload eventualities and supply the most effective job performance among all tested algorithms.
References
- Durga solutions by mapreduce " https://www.youtube.com/watchv=6oemzejdmp8"
- Hadoop, http://hadoop.apache.org (dec. 3, 2014)
- S. Chen and s. Schlosser, "map-reduce meets wider varieties of applications," technical report irp-tr-08-05, intel research, 2008.
- B. White, t. Yeh, j. Lin, and l. Davis, "web-scale computer vision using mapreduce for multimedia data mining," in proceedings of the tenth international workshop on multimedia data mining, pp. 1-10. Acm, july 2010.
- A. Matsunaga, m. Tsugawa, and j. Fortes, "cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications," in ieee fourth international conference on escience, pp. 222-229, december 2008.
- X-rime. Http://xrime.sourceforge.net/ (dec. 3, 2014)
- K. Wiley, a. Connolly, j. Gardner, s. Krughoff, m. Balazinska, b. Howe, y. Kwon, and y. Bu, "astronomy in the cloud: using mapreduce for image co-addition," astronomy, 123(901), pp. 366-380, 2011.
- Disco, http://discoproject.org (dec. 3, 2014)
- Gridgain, http://www.gridgain.com (dec. 3, 2014)
- David d. Clark, member, ieee, kenneth t. Pogran, member, ieee, and david p. Wed " an introduction to local area networks" https://groups.csail.mit.edu/ana/publications/pubpdfs/an%20introduction%20to%20local%20area%20networks.pdf
- Vidyullatha Pellakuri1 , Dr.D. Rajeswara Rao2" Hadoop Mapreduce Framework in Big Data Analytics " http://ijcttjournal.org/Volume8/number-3/IJCTT-V8P121.pdf
- Abdullah Almurayh" Virtual Private Server" in 2010 http://cs.uccs.edu/~cs526/studentproj/projS2010/aalmuray/doc/Almurayh_VPS.pdf
- " Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning" http://ieeexplore.ieee.org/document/7523426/
- " Optimal MapReduce Job Scheduling algorithm across Cloud Federation " http://csce.ucmss.com/books/LFS/CSREA2017/PDP3681.pdf
- Zhenhua Guo, Geoffrey Fox, Mo Zhou " Investigation of Data Locality in MapReduce " https://pdfs.semanticscholar.org/48b5/568d8cec22d167c88d10a4de01f48a4740d0.pdf
- " Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters" http://ieeexplore.ieee.org/document/7065298/
- Z. Guo, G. Fox, and M. Zhou, "Investigation of data locality in mapreduce," In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 419-426, May 2012.
- C. He, Y. Lu, and D. Swanson, "Matchmaking: A new mapreduce scheduling technique," In 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom 2011), pp. 40-47, November 2011. [16] T
- T. White, "Hadoop: the definitive guide," O'Reilly Media, Yahoo! Press, June 5, 2009. [
- M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling," In Proceedings of the 5th European conference on Computer systems, pp. 265-278. ACM, April 2010, http://dx.doi.org/10.1145/1755913.1755940
- J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong, "BAR: an efficient data locality driven task scheduling algorithm for cloud computing," In 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), pp. 295-304, May 2011.
- "Hadoop MapReduce Scheduling Algorithms - A Survey" http://www.ijcsmc.com/docs/papers/December2015/V4I12201548.pdf
- Fair Scheduler Guide, http://archive.cloudera.com/cdh/3/hadoop0.20.2+737/fair_scheduler.html (Dec. 3, 2014)
- Capacity Scheduler Guide, http://archive.cloudera.com/cdh/3/hadoop0.20.2+737/capacity_scheduler.html (Dec. 3, 2014)
- "https://www.youtube.com/watchv=AcUauzCn7RE" youtube API extract data from youtube.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.