Analysis of Data Performance that Reduces Resource Utilization Overheads and Increases the Efficiency
DOI:
https://doi.org/10.32628/CSEIT228369Keywords:
Big Data, Resource Utilization, Spark, Hadoop, Cloud ComputingAbstract
In today, the size of the data is increasing at a random speed. So, this leads to processing of Big data. When we compare this in business applications where the volume of data is huge and at the same time it should be processed in efficient manner. Traditional system fails to process the bigdata because most of the data in bigdata is unstructured. To improve performance in distributed data processing resource utilization plays vital role. There are resource gaps develop while execution occurs. This is more frequent in heterogeneous environment. In the previous techniques there is wastage or not efficient usage of resources. To process data in distributed environment multiple platforms used such as Apache Hadoop, Apache Spark etc. Here we develop new algorithm that reduces the usage of resources and increases the performances. The algorithm implemented in Apache Spark distributed environment. The experimental results indicate efficient utilization of resources and increase in performance.
References
- www.en.wikipedia.org
- Gartner IT Glossary 2013
- Gueyoung Jung ; Gnanasambandam, N. ; Mukherjee, T. Big Data Analytics2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
- McKinsey Global Institute Big data: The next frontier for innovation, competition, and productivity 2011
- O’Reilly Strata An Introduction to the big data landscape 2012
- Microsoft Enterprise Insights The Big Bang: How the Big Data Explosion Is Changing the World
- IBM Big Data at the Speed of Business 2012
- A. Ambore and U. R. V., "A Survey on Data Placement Strategy in Big Data Heterogeneous Environments," 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), 2019, pp. 439-443, doi: 10.1109/ICOEI.2019.8862676
- Chung, Wu-Chun & Wu, Tsung-Lin & Lee, Yi-Hsuan & Huang, Kuo-Chan & Hsiao, Hung-Chang & Lai, Kuan-Chou. (2020). Minimizing Resource Waste in Heterogeneous Resource Allocation for Data Stream Processing on Clouds. Applied Sciences. 11. 149. 10.3390/app11010149.
- https://towardsdatascience.com/apache-spark-performance-boosting-e072a3ec1179
- Patty JW, Penn EM (2015) Analyzing big data: social choice and measurement. Polit Sci Polit 48(01):95–101
- Yang C et al (2014) A spatiotemporal compression based approach for efficient big data processing on Cloud. J Comput Syst Sci 80(8):1563–1583
- Dong W et al (2011) Tradeoffs in scalable data routing for deduplication clusters. In: FAST
- Xia W et al (2011) SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In: USENIX annual technical conference
- Fan J, Han F, Liu H (2014) Challenges of big data analysis. Nat Sci Rev 1(2):293–314
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.