Analysis of Data Performance that Reduces Resource Utilization Overheads and Increases the Efficiency

Anilkumar Ambore; Udaya Rani V

doi:10.32628/CSEIT228369

Authors

Anilkumar Ambore Research Scholar, VTU, Department of CSE, REVA ITM, Bangalore, India
Udaya Rani V Department of CSE, REVA ITM, Bangalore, India

DOI:

https://doi.org/10.32628/CSEIT228369

Keywords:

Big Data, Resource Utilization, Spark, Hadoop, Cloud Computing

Abstract

In today, the size of the data is increasing at a random speed. So, this leads to processing of Big data. When we compare this in business applications where the volume of data is huge and at the same time it should be processed in efficient manner. Traditional system fails to process the bigdata because most of the data in bigdata is unstructured. To improve performance in distributed data processing resource utilization plays vital role. There are resource gaps develop while execution occurs. This is more frequent in heterogeneous environment. In the previous techniques there is wastage or not efficient usage of resources. To process data in distributed environment multiple platforms used such as Apache Hadoop, Apache Spark etc. Here we develop new algorithm that reduces the usage of resources and increases the performances. The algorithm implemented in Apache Spark distributed environment. The experimental results indicate efficient utilization of resources and increase in performance.

References

www.en.wikipedia.org
Gartner IT Glossary 2013
Gueyoung Jung ; Gnanasambandam, N. ; Mukherjee, T. Big Data Analytics2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
McKinsey Global Institute Big data: The next frontier for innovation, competition, and productivity 2011
O’Reilly Strata An Introduction to the big data landscape 2012
Microsoft Enterprise Insights The Big Bang: How the Big Data Explosion Is Changing the World
IBM Big Data at the Speed of Business 2012
A. Ambore and U. R. V., "A Survey on Data Placement Strategy in Big Data Heterogeneous Environments," 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), 2019, pp. 439-443, doi: 10.1109/ICOEI.2019.8862676
Chung, Wu-Chun & Wu, Tsung-Lin & Lee, Yi-Hsuan & Huang, Kuo-Chan & Hsiao, Hung-Chang & Lai, Kuan-Chou. (2020). Minimizing Resource Waste in Heterogeneous Resource Allocation for Data Stream Processing on Clouds. Applied Sciences. 11. 149. 10.3390/app11010149.
https://towardsdatascience.com/apache-spark-performance-boosting-e072a3ec1179
Patty JW, Penn EM (2015) Analyzing big data: social choice and measurement. Polit Sci Polit 48(01):95–101
Yang C et al (2014) A spatiotemporal compression based approach for efficient big data processing on Cloud. J Comput Syst Sci 80(8):1563–1583
Dong W et al (2011) Tradeoffs in scalable data routing for deduplication clusters. In: FAST
Xia W et al (2011) SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In: USENIX annual technical conference
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Nat Sci Rev 1(2):293–314

Analysis of Data Performance that Reduces Resource Utilization Overheads and Increases the Efficiency

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite