Data Analysis Using R and Hadoop

Authors

  • Amit Rajbanshi  Department of Computer Science and Engineering, Adesh College of Engineering & Technology, Chandigarh, Kharar, Punjab, India
  • Birendra Kumar Sah  Department of Computer Science and Engineering, Adesh College of Engineering & Technology, Chandigarh, Kharar, Punjab, India
  • C. K. Raina  Department of Computer Science and Engineering, Adesh College of Engineering & Technology, Chandigarh, Kharar, Punjab, India

Keywords:

R, Big Data, Hadoop, Rhipe, Rhadoop, Streaming

Abstract

Analyzing and managing huge information may be very hard exploitation classical means like electronic data service management systems or desktop package package packages for statistics and image. Instead, huge information desires huge clusters with an entire heap or even thousands of computing nodes. Official statistics is progressively} considering huge information for clarification new statistics as a results of huge information sources would possibly manufacture additional relevant and timely statistics than ancient sources. one of the package package tools successfully and wide unfold used for storage and method of huge information sets on clusters of artefact hardware is Hadoop. Hadoop framework contains libraries, a distributed file-system (HDFS), and a resource-management platform and implements a version of the MapReduce programming model for big scale process. throughout this paper we've got an inclination to analyze the possibilities of integration Hadoop with R that would be a stylish package package used for applied mathematics computing and information image. we've got an inclination to gift three ways in which of integration them: R with Streaming, Rhipe and RHadoop which we have a tendency to emphasize the advantages and downsides of each answer.

References

  1. Ahas, R., and Tiru, M., victimisation mobile positioning information for touristry statistics: Sampling and information management problems, NTTS - Conferences on New Techniques and Technologies for Statistics, Bruselles.
  2. Beyer, M., "Gartner Says determination 'Big Data' Challenge Involves over simply Managing Volumes of Data". Gartner, accessible at http://www.gartner.com/newsroom/id/1731916,  accessed on twenty fifth March 2014.
  3. Cleveland, William S., Guha, S., Computing atmosphere for the applied math analysis of huge and sophisticated information, degree treatise, Purdue University West Lafayette.
  4. Dean, J., and  Ghemawat, S., "MapReduce: Simplifi erectile dysfunction processing on giant Clusters", accessible at http://static.googleusercontent.com/media/research.google.com/ro//archive/mapreduce-osdi04.pdf, accessed on twenty fifth March 2014.
  5. High-Level cluster for the improvement of applied math Production and Services (HLG), (2013), What will "big data" mean for of fi cial statistics?, UNECE, accessible at http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=77170614, accessed on twenty fifth March 2014.
  6. Holmes, A , Hadoop in follow, Manning Publications, New Jersey.
  7. Mayer-Schönberger, V. , and  Cukier, K , "Big Data: A Revolution That Transforms however we have a tendency to Work, Live, and Think", Houghton Mif American state in Harcourt.
  8. Prajapati, V , huge information analysis with R and Hadoop, PaktPublishing.
  9. R Core Team , associate Introduction to R, accessible at http://www.r-project.org/, accessed on twenty fifth March 2014.

Downloads

Published

2017-12-31

Issue

Section

Research Articles

How to Cite

[1]
Amit Rajbanshi, Birendra Kumar Sah, C. K. Raina, " Data Analysis Using R and Hadoop, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 6, pp.1093-1097, November-December-2017.