GPU Parallel Computing Architectures : Unlocking the Power of Parallelism for High-Performance Applications

Authors

  • Anil Kumar Chunduru Arizona State University, USA Author

DOI:

https://doi.org/10.32628/CSEIT24106175

Keywords:

GPU Parallel Computing, CUDA and OpenCL, Tensor Cores, High-Performance Computing, Scientific Simulations

Abstract

Graphics Processing Units (GPUs) have evolved from specialized graphics rendering hardware to become powerful parallel computing architectures, revolutionizing high-performance computing across diverse domains. This comprehensive article explores the fundamental principles of GPU parallel computing architectures, their design, and their impact on modern computational challenges. We begin by examining the multi-core structure, memory hierarchy, and data processing capabilities of GPUs, including the SIMD execution model and thread organization. The article then delves into prominent programming models like CUDA and OpenCL, discussing their features and comparative advantages. We explore how GPUs are leveraged for general-purpose computing in scientific simulations, machine learning, and big data analytics, while also addressing the challenges inherent in GPU parallel computing, such as data transfer bottlenecks and load balancing. Recent technological advancements, including tensor cores, unified memory architecture, and ray tracing acceleration, are analyzed for their transformative potential. The article concludes by examining future directions in GPU technology, including integration with emerging technologies like quantum computing, advancements in energy efficiency, and the potential impact on solving complex global challenges. Through this comprehensive analysis, we illustrate the pivotal role of GPU parallel computing architectures in shaping the future of high-performance computing and their potential to address some of the world's most pressing computational problems.

Downloads

Download data is not yet available.

References

J. Nickolls and W. J. Dally, "The GPU Computing Era," in IEEE Micro, vol. 30, no. 2, pp. 56-69, March-April 2010, doi: 10.1109/MM.2010.41. [Online]. Available: https://ieeexplore.ieee.org/document/5446251 DOI: https://doi.org/10.1109/MM.2010.41

M. Garland and D. B. Kirk, "Understanding throughput-oriented architectures," in Communications of the ACM, vol. 53, no. 11, pp. 58-66, Nov. 2010, doi: 10.1145/1839676.1839694. [Online]. Available: https://dl.acm.org/doi/10.1145/1839676.1839694 DOI: https://doi.org/10.1145/1839676.1839694

S. Ryoo et al., "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA," in Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08), 2008, pp. 73-82, doi: 10.1145/1345206.1345220. [Online]. Available: https://dl.acm.org/doi/10.1145/1345206.1345220 DOI: https://doi.org/10.1145/1345206.1345220

W. W. L. Fung and T. M. Aamodt, "Thread Block Compaction for Efficient SIMT Control Flow," in 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 2011, pp. 25-36, doi: 10.1109/HPCA.2011.5749714. [Online]. Available: https://ieeexplore.ieee.org/document/5749714 DOI: https://doi.org/10.1109/HPCA.2011.5749714

J. Cheng, M. Grossman, and T. McKercher, "Professional CUDA C Programming," in Wrox Press, 2014, ISBN: 978-1-118-73932-7. https://www.wiley.com/en-us/Professional+CUDA+C+Programming-p-9781118739310

J. E. Stone, D. Gohara, and G. Shi, "OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems," in Computing in Science & Engineering, vol. 12, no. 3, pp. 66-73, May-June 2010, doi: 10.1109/MCSE.2010.69. [Online]. Available: https://ieeexplore.ieee.org/document/5457293 DOI: https://doi.org/10.1109/MCSE.2010.69

S. Mittal and J. S. Vetter, "A Survey of CPU-GPU Heterogeneous Computing Techniques," in ACM Computing Surveys, vol. 47, no. 4, Article 69, July 2015, doi: 10.1145/2788396. [Online]. Available: https://dl.acm.org/doi/10.1145/2788396 DOI: https://doi.org/10.1145/2788396

V. Sze, Y. H. Chen, T. J. Yang and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017, doi: 10.1109/JPROC.2017.2761740. [Online]. Available: https://ieeexplore.ieee.org/document/8114708 DOI: https://doi.org/10.1109/JPROC.2017.2761740

S. Mittal and J. S. Vetter, "A Survey of Methods for Analyzing and Improving GPU Energy Efficiency," in ACM Computing Surveys, vol. 47, no. 2, Article 19, Jan. 2015, doi: 10.1145/2636342. [Online]. Available: https://dl.acm.org/doi/10.1145/2636342 DOI: https://doi.org/10.1145/2636342

Downloads

Published

12-11-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 318

You may also start an advanced similarity search for this article.