Wide Area Analytics : Efficient Analytics For A Geo Distributed Datacenters

Authors(2) :-Alapati Janardhana Rao, Bellamkonda Naresh

Large organizations today operate data centers around the globe where massive amounts of data are produced and consumed by local users. Despite their geographically diverse origin, such data must be analyzed/mined as a whole. We call the problem of supporting rich DAGs of computation across geographically distributed data Wide-Area Big-Data (WABD). To the best of our knowledge, WABD is not sup-ported by currently deployed systems nor sufficiently studied in literature; it is addressed today by continuously copying raw data to a central location for analysis. We observe from production workloads that WABD is important for large organizations, and that centralized solutions incur substantial cross-data center network costs. We argue that these trends will only worsen as the gap between data volumes and transoceanic bandwidth widens. Further, emerging concerns over data sovereignty and privacy may trigger government regulations that can threaten the very viability of centralized solutions. To address WABD we propose WANalytics, a system that pushes computation to edge data centers, automatically optimizing work own execution plans and replicating data when needed. Our Hardtop-based prototype delivers 257 reductions in WAN bandwidth on a production workload from Microsoft. We round out our evaluation by also demonstrating substantial gains for three standard benchmarks: TPC-CH, Berkeley Big Data, and Big Bench.

Authors and Affiliations

Alapati Janardhana Rao
MCA Department, Vignan's Lara Institute of Technology and Science, Vadlamudi, Guntur, Andhra Pradesh, India
Bellamkonda Naresh
MCA Department, Vignan's Lara Institute of Technology and Science, Vadlamudi, Guntur, Andhra Pradesh, India

Big Data, Analytics, Geo-Distributed Datacenters

Publication Details

Published in : Volume 4 | Issue 2 | March-April 2018
Date of Publication : 2018-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 08-19
Manuscript Number : CSEIT1833607
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Alapati Janardhana Rao, Bellamkonda Naresh , "Wide Area Analytics : Efficient Analytics For A Geo Distributed Datacenters", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 4, Issue 2, pp.08-19, March-April.2018
URL : http://ijsrcseit.com/CSEIT1833607

