Different Machine Learning Algorithms used for Secure Software Advance using Software Repositories

Authors

  • Kanchan Chaudhary  Department of Computer science and Engineering, Integral University, Lucknow, Uttar Pradesh, India
  • Dr. Shashank Singh  Department of Computer science and Engineering, Integral University, Lucknow, Uttar Pradesh, India

DOI:

https://doi.org//10.32628/CSEIT2390225

Keywords:

Machine Learning, Topic Modeling, Cyber Security, CAPEC, MITRE

Abstract

In the present phase of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Cyber Security attacks are significantly growing in today’s modern world of technology and advanced software development. The inclusion of cyber security defense is vital in every phase of software development. Identifying and implementing key relevant cyber security vulnerability controls during the early stages of the software development life cycle, i.e., the requirement phase is especially important. The Common Attack Pattern Enumeration & Classification (CAPEC) is a publicly available software repository from MITRE that currently lists 555 vulnerability attack patterns. As Cyber Security continues to exponentially grow in complexity, the importance of the Machine Learning role to automate the identification of vulnerabilities for various software development is paramount to aid software developers in creating protected software. This paper discusses the conducted survey on different machine learning algorithms used for secure software development using software repositories.

References

  1. Vanamala, M., Y. Xiaohong, and B. Kanishka. 2019.  Analyzing CVE Database UsingUnsupervised Topic Modelling. 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Dec 05-07, IEEE Xplore Press, USA, pp: 72-77. DOI:10.1109/CSCI49370.2019.00019.
  2. Vanamala, M., J. Gilmore, X. Yuan, and K. Roy. 2020.  Recommending Attack Patterns for Software Requirements Document. 2020 International Conference on Computational Science and Computational Intelligence (CSCI), 2020, IEEE Xplore Press, USA, pp: 1813-1818. DOI:10.1109/CSCI51800.2020.00334.
  3. Vanamala, M., X. Yuan and K. Roy. 2020. Topic Modeling And Classification Of Common Vulnerabilities And Exposures Database. 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Aug 06-07, IEEE Xplore Press, South Africa, pp: 1-5. DOI:10.1109/icABCD49160.2020.9183814.
  4.         Kanakogi, K., H. Washizaki, Y. Fukazawa, S. Ogata, T. Okubo, T. Kato, H. Kanuka, H. Hazeyama and N. Yoshioka. 2022. Comparative Evaluation of NLP-Based Approaches for Linking CAPEC Attack Patterns from CVE Vulnerability Information. Applied Sciences, 12 (7): 3400. DOI:10.3390/app12073400.
  5.         Krzeszewska, U., A. Poniszewska-Marańda and J. Ochelska-Mierzejewska. 2022. Systematic Comparison of Vectorization Methods in Classification Context. Applied Sciences 12 (10): 5119. DOI:10.3390/app12105119.
  6. Alyami, H., M. Nadeem, A. Alharbi, W. Alosaimi, M. Ansari, D. Pandey, R. Kumar and R. Khan. 2021. The Evaluation of Software Security through Quantum Computing Techniques: A Durability Perspective. Applied Sciences, 11 (24): 11784. DOI:10.3390/app112411784.
  7.         Guru Prasad, G., M. Badrinarayanan and C. Ceronmani Sharmila. 2022. Efficacy and Security Effectiveness: Key Parameters in Evaluation of Network Security. International Journal of Performability Engineering, 18 (4) : 282. DOI:10.23940/ijpe.22.04.p6.282288.
  8.         Zhu, L., Y. He, and D. Zhou. 2020. A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings. Transactions of the Association for Computational Linguistics, 8: 471–485. DOI:10.1162/tacl_a_00326
  9.        Asim, M., M. Ghani, M. Ibrahim, W. Mahmood, A. Dengel, and S. Ahmed. 2021. Benchmarking Performance of Machine and Deep Learning-Based Methodologies for Urdu Text Document Classification. Neural Computing & Applications, 33 (11): 5437. DOI:10.1007/s00521-020-05321-8.
  10. Bedi, G. 2018. Simple Guide to Text Classification(NLP) Using SVM and Naive Bayes with Python. Medium. https://medium.com/@bedigunjit/simple-guide-to-text-classification-nlp-using-svm-and-naive-bayes-with-python-421db3a72d34  (Accessed on November 17, 2022)
  11. Shalev-Shwartz, S., and S. Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. ISBN: 1107057132.  https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/
  12. Macsai, D. 2012. The most important company you’ve never heard of. 1 Minute Read. Fast Company. https://www.fastcompany.com/3017927/30mitre (Accessed on November 10, 2022)
  13. A course module on HTML5 new features and security concerns
  14. Vanamala, M., Yuan, X., & Morgan, M. (2019). A course module on HTML5 new features and security concerns. Journal of Computing Sciences in Colleges, 34(5), 23-30.
  15. Forest-[Frederick-Livingston].pdf (Accessed on November 12, 2022)
  16. Vanamala, M., Yuan, X., Smith, W., & Bennett, J. (2022). Interactive Visualization Dashboard for Common Attack Pattern Enumeration Classification. ICSEA 2022, 79.
  17. Mohamed, A. 2017. Comparative study of four supervised machine learning techniques for classification. International Journal of Applied Science and Technology, 7 (2): 1-15.   https://www.ijastnet.com/journal/index/859
  18. Uddin, S., A. Khan, M. Hossain, and M. Moni. 2019. Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19 (1): 1-16. DOI:10.1186/s12911-019-1004-8.
  19. Delli, U., and S. Chang. 2018. Automated process monitoring in 3D printing using supervised machine learning. Procedia Manufacturing, 26:  865-870. DOI:10.1016/j.promfg.2018.07.111.
  20. McAllister, P., H. Zheng, R. Bond, and A. Moorhead. 2018. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets. Computers in Biology and Medicine, 95 : 217-233. DOI:10.1016/j.compbiomed.2018.02.008.
  21. Schrider, D., and A. Kern. 2018. Supervised machine learning for population genetics: A new paradigm. Trends in Genetics, 34(4): 301–312. DOI:10.1016/j.tig.2017.12.005
  22. Rahman, A., F. Sazzadur, F. Shamrat, Z. Tasnim, J. Roy,  and S. Hossain. 2019. A comparative study on liver disease prediction using supervised machine learning algorithms. International Journal of Scientific & Technology Research, 8 (11): 419-422. http://www.ijstr.org/final-print/nov2019/A-Comparative-Study-On-Liver-Disease-Prediction-Using-Supervised-Machine-Learning-Algorithms.pdf
  23. Lasky N, Hallis B, Vanamala M, Dave R and Seliya N, (2022,November) Machine Learning Based Approach to Recommend MITRE ATT&CK Framework for  Software Requirements and Design  Specifications.In The 4th Colloquium on Analytics, Data Science, and Computing (CADSCOM 2022).ACM.Prakash, A., N. Singh, and S. Saha. 2022. Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry. ETRI Journal, 44 (3): 413-425. DOI:10.4218/etrij.2019-0396.
  24. Bellaouar, S., M. Bellaouar, and I. Ghada. 2021. Topic modeling: Comparison of LSA and LDA on scientific publications. In 2021 4th International Conference on Data Storage and Data Engineering, February, pp. 59-64. DOI:10.1145/3456146.3456156.
  25. Al-Sabahi, K., Z. Zuping, and Y. Kang. 2018. Latent semantic analysis approach for document summarization based on word embeddings. KSII Transactions on Internet and Information Systems, 13 (1): 254-276. DOI:10.3837/tiis.2019.01.015.
  26. Ullah, F., J. Wang, M. Farhan, S. Jabbar, M. Naseer, and M. Asif. 2020. LSA based smart assessment methodology for SDN infrastructure in IoT environment. International Journal of Parallel Programming, 48 (2): 162-177. DOI:10.1007/s10766-018-0570-1.
  27. Kim, D., and T. Im. 2022. A Systematic Review of Virtual Reality-Based Education Research Using Latent Dirichlet Allocation: Focus on Topic Modeling Technique. Mobile Information Systems, Volume 2022. DOI:10.1155/2022/1201852.
  28. Sharma, C., and S. Sharma, S. 2022. Latent DIRICHLET allocation (LDA) based information modelling on BLOCKCHAIN technology: a review of trends and research patterns used in integration. Multimedia Tools and Applications, 81:36805-36831.  DOI:10.1007/s11042-022-13500-z.
  29. Guo, Y., and Li, J. 2021. Distributed Latent Dirichlet Allocation on Streams. ACM Transactions on Knowledge Discovery from Data (TKDD), 16 (1) :      1-20. DOI:10.1145/3451528.
  30. León-Paredes, G., Barbosa-Santillán, L., and Sánchez-Escobar, J. 2017. A heterogeneous system based on latent semantic analysis using GPU and multi-CPU. Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments, Volume 2017. DOI:10.1155/2017/8131390.
  31. Ullah, F., Jabbar, S., and Mostarda, L. 2021. An intelligent decision support system for software plagiarism detection in academia. International Journal of Intelligent Systems, 36 (6): 2730-2752. DOI:10.1002/int.22399.
  32. Sanguri, Kamal, Atanu Bhuyan, and Sabyasachi Patra. 2020. A semantic similarity adjusted document co-citation analysis: a case of tourism supply chain. Scientometrics, 125 (1): 233-269. DOI:10.1007/s11192-020-03608-0.
  33. CAPEC, 2022. Common Attack Pattern Enumeration and Classification (CAPECTM). https://capec.mitre.org (Accessed on August 23, 2022)
  34. MITRE ATT&CK®, 2022. https://attack.mitre.org Accessed 8/23/2022.
  35. CVE, 2022. https://cve.mitre.org (Accessed on August 25, 2022)
  36. CISA, 2019. What Is Cybersecurity? | CISA. https://www.cisa.gov/uscert/ncas/tips/ST04-001. (Accessed on September 14, 2022)
  37. NIST, 2019. About NIST. https://www.nist.gov/about-nist. (Accessed on September 21, 2022)
  38. IBM, 2019. What is machine learning?  https://www.ibm.com/topics/machine-learning?lnk=fle. (Accessed on September 2022)

Downloads

Published

2023-04-30

Issue

Section

Research Articles

How to Cite

[1]
Kanchan Chaudhary, Dr. Shashank Singh, " Different Machine Learning Algorithms used for Secure Software Advance using Software Repositories , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 9, Issue 2, pp.300-317, March-April-2023. Available at doi : https://doi.org/10.32628/CSEIT2390225