Architecture Design for Hadoop No-SQL and Hive

Authors

  • A. Antony Prakash  Assistant Professor, Information Tech, St Joseph's College - Tiruchirappalli, Tamil Nadu, India
  • Dr. A. Aloysius  Assistant Professor, Computer Science, St Joseph's College - Tiruchirappalli , Tamil Nadu, India

Keywords:

Big Data, Hadoop, Map Reduce, Apache Hive, No SQL, and Overflow.

Abstract

Big data came into existence when the traditional relational database systems were not able to handle the unstructured data (weblogs, videos, photos, social updates, human behaviour) generated today by organisation, social media, or from any other data generating source. Data that is so large in volume, so diverse in variety or moving with such velocity is called Big data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Apache Hive, No SQL and HPCC, Overflow. These technologies handle massive amount of data in MB, PB, YB, ZB, KB and TB. In this research paper various technologies for handling big data along with the advantages and disadvantages of each technology for catering the problems in hand to deal the massive data has discussed.

References

  1. Yuri Demchenko “The Big Data Architecture Framework (BDAF)” Outcome of the Brainstorming Session at the University of Amsterdam 17 July 2013.
  2. Tekiner F. and Keane J.A., Systems, Man and Cybernetics (SMC), “Big Data Framework” 2013 IEEE International Conference on 13–16 Oct. 2013, 1494–1499.
  3. Margaret Rouse, April 2010 “unstructured data”.
  4. Nguyen T.D., Gondree M.A., Khosalim, J.; Irvine, “Towards a Cross Domain MapReduce Framework“ IEEE C.E. Military Communications Conference, MILCOM 2013, 1436 – 1441
  5. Dong, X.L.; Srivastava, D. Data Engineering (ICDE),” Big data integration“ IEEE International Conference on , 29(2013) 1245–1248
  6. Jian Tan; Shicong Meng; Xiaoqiao Meng; Li ZhangINFOCOM, “Improving ReduceTask data locality for sequential MapReduce” 2013 Proceedings IEEE ,1627 - 1635
  7. Yaxiong Zhao; Jie Wu INFOCOM, “Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework” 2013 Proceedings IEEE 2013, 35 - 39 (Volume 19)
  8. Sagiroglu, S.; Sinanc, D.,”Big Data: A Review”,2013,20-24
  9. Minar, N.; Gray, M.; Roup, O.; Krikorian, R.; Maes, “Hive: distributed agents for networking things“ IEEE CONFERENCE PUBLICATIONS 1999 (118-129)
  10. Garlasu, D.; Sandulescu, V.; Halcu, I.; Neculoiu, G,”A Big Data implementation based on Grid Computing”, Grid Computing, 2013, 17-19
  11. Mukherjee, A.; Datta, J.; Jorapur, R.; Singhvi, R.; Haloi, S.; Akram, “Shared disk big data analytics with Apache Hadoop”, 2012, 18-22
  12. Aditya B. Patel, Manashvi Birla, Ushma Nair, “Addressing Big Data Problem Using Hadoop and Map Reduce”, 2012, 6-8
  13. Jefry Dean and Sanjay Ghemwat, MapReduce:A Flexible Data Processing Tool, Communications of the ACM, Volume 53, Issuse.1,2010, 72-77.
  14. Chan,K.C.C. Bioinformatics and Biomedicine (BIBM), “Big data analytics for drug discovery” IEEE International Conference on Bioinformatics and Biomedicine 2013,1.
  15. Kyuseok Shim, MapReduce Algorithms for Big Data Analysis, DNIS 2013, LNCS 7813, pp. 44–48, 2013.
  16. Wang, J.; Xiao, Q.; Yin, J.; Shang, P. Magnetics, “DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality“IEEE Transactions ( Vol: 49 ), 2013, 2514 – 2520
  17. HADOOP-3759: Provide ability to run memory intensive jobs without affecting other running tasks on the nodes.

Downloads

Published

2018-02-28

Issue

Section

Research Articles

How to Cite

[1]
A. Antony Prakash, Dr. A. Aloysius, " Architecture Design for Hadoop No-SQL and Hive, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 1, pp.1069-1077, January-February-2018.