Splitting and Grouping of Jobs in Map Reduction for Various Multicore Processors

M. Navya; N. Padmaja

doi:10.32628/CSEIT172568

Authors

M. Navya M.Tech Student, Department of Computer Science and Engineering, Padmavathi Mahila Visvavidyalayam, Tirupati, India
N. Padmaja Assistant Professor, Department of Computer Science and Engineering, Padmavathi Mahila Visvavidyalayam, Tirupati, India

Keywords:

Map Reduce, Hadoop scheduler, Dyscale, throughput, Heterogeneous processors

Abstract

The functionality of modern multi-core processors is often driven by a given power budget that requires designers to evaluate different decision trade-offs, e.g., to choose between many slow, power-efficient cores, or fewer faster, power-hungry cores, or a combination of them. Here, we prototype and evaluate a new Hadoop scheduler, called DyScale, that exploits capabilities offered by heterogeneous cores within a single multi-core processor for achieving a variety of performance objectives. A typical Map Reduce workload contains jobs with different performance goals: large, batch jobs that are throughput oriented, and smaller interactive jobs that are response time sensitive. Heterogeneous multi-core Processors enable creating virtual resource pools based on "slow" and "fast" cores for multi-class priority scheduling. Since the same data can be accessed with either "slow" or "fast" slots, spare resources (slots) can be shared between different resource pools. Using measurements on an actual experimental setting and via simulation, we argue in favor of heterogeneous multi-core processors as they achieve "faster" (up to 40%) processing of small, interactive Map Reduce jobs, while offering improved throughput (up to 40%) for large, batch jobs. We evaluate the performance benefits of DyScale versus the FIFO and Capacity job schedulers that are broadly used in the Hadoop community.

References

T. White, Hadoop: The Definitive Guide. Yahoo Press.
F. Ahmad et al., "Tarazu: Optimizing Map Reduce on Heterogeneous Clusters," in Proceedings of ASPLOS, 2012.
J. Dean and S. Ghemawat, "Map Reduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, 2008.
M. Zaharia et al., "Delay scheduling: A simple technique for Achieving locality and fairness in cluster scheduling," in Proceedings of EuroSys, 2010.
Apache, "Capacity Scheduler Guide," 2010. Online]. Available: http://hadoop.apache.org/common/docs/r0.20.1/ capacity scheduler.html
Z. Zhang, L. Cherkasova, and B. T. Loo, "Benchmarking approach for designing a map reduce performance model," in ICPE, 2013, pp. 253-258.
S. Rao et al., "Sailfish: A Framework For Large Scale Data Processing," in Proceedings of SOCC, 2012.
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava, "Building a high-level dataflow system on top of map reduce: The pig experience," PVLDB, vol. 2, no. 2, pp. 1414-1425, 2009.
A. Verma, L. Cherkasova, and R. H. Campbell, "ARIA: Automatic Resource Inference and Allocation for MapReduce Environments," in Proc. of ICAC, 2011.
"Play It Again, SimMR!" in Proceedings of Intl. IEEE Cluster’ 2011.
S. Ren, Y. He, S. Elnikety, and S. McKinley, "Exploiting Processor Heterogeneity in Interactive Services," in Proceedings of ICAC, 2013.
H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley, "Looking back and looking forward: power, performance, and upheaval," Commun. ACM, vol. 55, no. 7, 2012.
C. Bienia, S. Kumar, J. Singh, and K. Li, "The PARSEC benchmark suite: Characterization and architectural implications." in Technical Report TR-811-08, Princeton University, 2008.
"Pass Mark Software. CPU Benchmarks," 2013. Online]. Available: http://www.cpubenchmark.net/cpu.php?cpu=Intel+ Xeon+E3-1240+%40+3.30GHz
F. Yan, L. Cherkasova, Z. Zhang, and E. Smirni, "Optimizing power and performance trade-offs of map reduce job processing with heterogeneous multi-core processors," in Proc. of the IEEE 7th International Conference on Cloud Computing (Cloud’2014), June, 2014.
A. Verma et al., "Deadline-based workload management for map reduce environments: Pieces of the performance puzzle," in Proc. of IEEE/IFIP NOMS, 2012.
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas, "Single-is a heterogeneous multi-core architectures for multithreaded workload performance," in ACM SIGARCH Computer Architecture News, vol. 32, no. 2, 2004.
K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer, "Scheduling heterogeneous multi-cores through performance impact estimation (pie)," in Proceedings of the 39th International Symposium on Computer Architecture, 2012.
M. Becchi and P. Crowley, "Dynamic thread assignment on heterogeneous multiprocessor architectures," in Proceedings of the 3rd conference on Computing frontiers, 2006.
D. Shelepov and A. Fedorova, "Scheduling on heterogeneous multi core processors using architectural signatures," in Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture, 2008.
K. Van Craeynest and L. Eeckhout, "Understanding fundamental design choices in single-is a heterogeneous multicore architectures," ACM Transactions on Architecture and Code Optimization (TACO), vol. 9, no. 4, p. 32, 2013.
M. Zaharia et al., "Improving map reduce performance in heterogeneous environments," in Proceedings of OSDI, 2008.
Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo, "Samr: A self-adaptive map reduce scheduling algorithm in heterogeneous environment," in IEEE 10th International Conference on Computer and Information Technology (CIT), 2010.
R. Gandhi, D. Xie, and Y. C. Hu, "Pikachu: How to rebalance load in optimizing map reduce on heterogeneous clusters," in Proceedings of 2013 USENIX Annual Technical Conference. USENIX Association, 2013.
J. Xie et al., "Improving map reduce performance through data placement in heterogeneous hadoop clusters," in Proceedings of the IPDPS Workshops: Heterogeneity in Computing, 2010.
G. Gupta, C. Fritz, B. Price, R. Hoover, J. DeKleer, and C. Witteveen, "Throughput Scheduler: Learning to Schedule on Heterogeneous Hadoop Clusters," in Proc. of ICAC, 2013.
G. Lee, B.-G. Chun, and R. H. Katz, "Heterogeneity-aware resource allocation and scheduling in the cloud," in Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, Hot Cloud, 2011.
J. Polo et al., "Performance management of accelerated map reduce workloads in heterogeneous clusters," in Proceedings of the 41st Intl. Conf. on Parallel Processing, 2010.
W. Jiang and G. Agrawal, "Mate-cg: A map reduce-like framework for accelerating data-intensive computations on heterogeneous clusters," in Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, May 2012, pp. 644-655.
Apache, "Apache Hadoop Yarn," 2013. Online]. Available: http://hadoop.apache.org/docs/current/hadoop-yarn/ hadoop-yarn-site/YARN.html
A. Verma, L. Cherkasova, and R. H. Campbell, "Resource Provisioning Framework for Map Reduce Jobs with Performance Go als," Proc. of the 12th ACM/IFIP/USENIX Middleware Conference, 2011.

Splitting and Grouping of Jobs in Map Reduction for Various Multicore Processors

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite