Study of Machine Learning Techniques using Apache Spark

Authors(4) :-Soumya Manjunath Hegde , Shilpa .M, Soujanya .C .S, Urvashi Grover

The challenges in the field of big data analysis is growing due to the huge volume of data collected on daily basis by social media, weather forecast, mobile data etc. In this survey paper, there is a look on different aspects of usage of Apache spark, be it, the framework, the libraries, the spark technologies etc. The spark platform provides various algorithms to analyse machine learning techniques and implement them on other virtualization platforms such as VMware vSphere. Further, Spark is used on different platforms to achieve high performance, overcome latency and achieve efficiency. The papers, studied here, have drawn parallelism between the Hadoop and the Spark and the latter has proved to be the best platform as it is hundred times faster and more efficient.

Authors and Affiliations

Soumya Manjunath Hegde
Eighth semester, Department of ISE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India
Shilpa .M
Eighth semester, Department of ISE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India
Soujanya .C .S
Eighth semester, Department of ISE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India
Urvashi Grover
Eighth semester, Department of ISE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India

weather forecast, virtualization, Hadoop, Spark, latency

  1. SPARK—A Big Data Processing Platform for Machine Learning. Jian Fu, Junwei Sun,Kaiyuan Wang. Wuhan, Hubei, China : IEEE, 2016.
  2. A Big Data Analysis Framework Using Apache Spark and Deep Learnig. Anand Gupta, Hardeo Kumar Thakur. Delhi, India : s.n., 2017.
  3. Big Data Machine Learning using Apache Spark MLlib. Mehdi Assefi, Ehsun Behravesh, Guangchi Liu, Ahmad P. Tafti. USA : s.n., 2017.
  4. Research of Intrusion Detection Algorithm Based on Parallel SVM on Spark. Hongbing Wang, Youan Xiao and Yihong Long. Wuhan, Hubei Province, China : s.n.
  5. Online Internet Traffic Monitoring System Using Spark Streaming. 2018.
  6. Shade: A Differentially-Private Wrapper For Enterprise Big Data. Alexander Heifetz, Vaikkunth Mugunthan and Lalana Kagal. Cambridge, USA : s.n., 2017.
  7. Spark-BDD: Debugging Big Data Applications. Tyson Condie, Muhammad Ali Gulzar, Matteo Interlandi, Miryung Kim,Todd Millstein Sai, Deep Tetali, Seunghyun Yoo. California, Los Angeles : s.n.
  8. Spark-SIFT: A Spark-Based Large-Scale Image Feature Extract System. xinming Zhan, YaoHua Yang, Li Shen. china : s.n., 2017.
  9. Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark. Tiantian Liu, Zhiyi Fang, Chen Zhao, Yingmin Zhou. china : s.n.
  10. Weather data analysis using Spark – An In-memory Computing framework. Ms.D.Jayanthi, Dr.G.Sumathi. INDIA : s.n., 2017.
  11. Towards Development of Spark Based Agricultural Information System including Geo-Spatial Data. Purnima Shah, Deepak Hiremath,Sanjay Chaudhary. Ahmedabad, India : s.n., 2017.
  12. Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark. Dries Harnie, Alexander E Vapirev, Jorg Kurt Wegner, Andery Gedich, Marvin Steijaert,Roel Wuyts and Wolfgang De Meuter. Belgium : s.n., 2015.
  13. Research on the forecast of Shared Bicycle rental demand based on spark machine learning framework. Zilu Kng, Yuting Zuo, Zhivin Huang, Feng Zhou, Penghui Chen. china : s.n., 2017.
  14. Real Time Road Traffic Event Detection using Twitter and Spark. Ketan R. Pandhare, Medha A Shah. India : s.n., 2017.
  15. Mobile Big Data Analytics Using Deep Learning and Apache Spark. Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, and Zhu Han. 2016.
  16. Survey on High Performance Analytics of Bigdata with Apache Spark. Ramkrushna C. Maheshwar, D. Haritha. India : s.n., 2016.

Publication Details

Published in : Volume 4 | Issue 6 | May-June 2018
Date of Publication : 2018-05-08
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 23-29
Manuscript Number : CSEIT184606
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Soumya Manjunath Hegde , Shilpa .M, Soujanya .C .S, Urvashi Grover, "Study of Machine Learning Techniques using Apache Spark", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 4, Issue 6, pp.23-29, May-June-2018.
Journal URL :

Article Preview