Phishing Detection Using Machine Learning Algorithm
DOI:
https://doi.org/10.32628/CSEIT2410228Keywords:
Phishing Detection, Feature Collection, Feature Selection, Classification, Machine Learning, Explainable AI, Data SetsAbstract
Phishing is a criminal scheme to steal the user’s personal data and other credential information. It is a fraud that acquires victim’s confidential information such as password, bank account detail, credit card number, financial username and password etc. and later it can be misuse by attacker. The use of machine learning algorithms in phishing detection has gained significant attention in recent years. This research paper aims to evaluate the effectiveness of various machine learning algorithms in detecting phishing URL’s/website. The algorithms tested in this study are Decision Tree, Random Forest, Multilayer Perceptron, XGBoost, Autoencoder Neural Network, and Support Vector Machines. A dataset of phishing URLs is used to train and test the algorithms, and their performance is evaluated based on metrics such as accuracy, precision, recall, and F1 Score. The paper takes in data of phished URL from Phishtank and legitimate URL from University of New Brunswick. The results of this study demonstrate that the Random Forest and XGBoost algorithms outperforms other algorithms in terms of accuracy and other performance metrics and the system has an overall accuracy of 98 %.
Downloads
References
A. Almseidin, Mohammad, AlMaha Abu Zuraiq, Mouhammd Al-Kasassbeh, and Nidal Alnidami. "Phishing detection based on machine learning and feature selection methods." (2019): 171-183. DOI: https://doi.org/10.3991/ijim.v13i12.11411
Zamir, Ammara, Hikmat Ullah Khan, Tassawar Iqbal, Nazish Yousaf, Farah Aslam, Almas Anjum, and Maryam Hamdani. "Phishing web site detection using diverse machine learning algorithms." The Electronic Library 38, no. 1 (2020): 65-80. DOI: https://doi.org/10.1108/EL-05-2019-0118
Jain, Ankit Kumar, and Brij B. Gupta. "Phishing detection: analysis of visual similarity based approaches." Security and Communication Networks 2017 (2017). DOI: https://doi.org/10.1155/2017/5421046
Gandotra, Ekta, and Deepak Gupta. "An efficient approach for phishing detection using machine learning." Multimedia Security: Algorithm Development, Analysis and Applications (2021): 239-253. DOI: https://doi.org/10.1007/978-981-15-8711-5_12
Jain, Ankit Kumar, and Brij B. Gupta. "Towards detection of phishing websites on client-side using machine learning based approach." Telecommunication Systems 68 (2018): 687-700. DOI: https://doi.org/10.1007/s11235-017-0414-0
Yadav, Neelam, and Supriya P. Panda. "Feature selection for email phishing detection using machine learning." In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2021, Volume 2, pp. 365-378. Springer Singapore,2022. DOI: https://doi.org/10.1007/978-981-16-2597-8_31
Sahingoz, Ozgur Koray, Ebubekir Buber, Onder Demir, and Banu Diri. "Machine learning based phishing detection from URLs." Expert Systems with Applications 117 (2019): 345-357. DOI: https://doi.org/10.1016/j.eswa.2018.09.029
Abdulraheem, Rana, Ammar Odeh, Mustafa Al Fayoumi, and Ismail Keshta. "Efficient Email phishing detection using Machine learning." In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0354-0358. IEEE, 2022. DOI: https://doi.org/10.1109/CCWC54503.2022.9720818
Tanimu, Jibrilla, and Stavros Shiaeles. "Phishing Detection Using Machine Learning Algorithm." In 2022 IEEE International Conference on Cyber Security and Resilience (CSR), pp. 317-322. IEEE, 2022. DOI: https://doi.org/10.1109/CSR54599.2022.9850316
Mithra Raj, Mukta, and J. Angel Arul Jothi. "Website Phishing Detection Using Machine Learning Classification Algorithms." In International Conference on Applied Informatics, pp. 219233.Cham: Springer International Publishing, 2022. DOI: https://doi.org/10.1007/978-3-031-19647-8_16
Zuhair, H., Selamat, A. & Salleh, M. (2016). Feature selection for phishing detection: a review of research. International Journal of Intelligent Systems Technologies and Applications, 15(2), 147-162. DOI: https://doi.org/10.1504/IJISTA.2016.076495
A.L. Blum and F. Langley, "Methods for Handling Large Amounts of Irrelevant Information in Machine Learning" in Artificial Intelligence, vol. 97, pp. 245271, 1997, Elsevier Science B.V. DOI: https://doi.org/10.1016/S0004-3702(97)00063-5
Abu-Nimeh, Saeed, Dario Nappa, Xinlei Wang, and Suku Nair. "A comparison of machine learning techniques for phishing detection." In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers’ summit, pp. 60-69. 2007. DOI: https://doi.org/10.1145/1299015.1299021
Shahrivari, Vahid, Mohammad Mahdi Darabi, and Mohammad Izadi. "Phishing detection using machine learning techniques." arXiv preprint arXiv:2009.11116 (2020).
Crawford, Michael, Taghi M. Khoshgoftaar, Joseph D. Prusa, Aaron N. Richter, and Hamzah Al Najada. "Survey of review spam detection using machine learning techniques." Journal of Big Data 2, no. 1 (2015): 1-24. DOI: https://doi.org/10.1186/s40537-015-0029-9
Rashid, Junaid, Toqeer Mahmood, Muhammad Wasif Nisar, and Tahira Nazir. "Phishing detection using machine learning technique." In 2020 first international conference of smart systems and emerging technologies (SMARTTECH), pp. 43-46. IEEE, 2020. DOI: https://doi.org/10.1109/SMART-TECH49988.2020.00026
Yi, Ping, Yuxiang Guan, Futai Zou, Yao, Wei Wang, and Ting Zhu. "Web phishing detection using a deep learning framework." Wireless Communications and Mobile Computing 2018 (2018). DOI: https://doi.org/10.1155/2018/4678746
Kumar, Nikhil, and Sanket Sonowal. "Email spam detection using machine learning algorithms." In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108-113. IEEE, 2020. DOI: https://doi.org/10.1109/ICIRCA48905.2020.9183098
Abdelhamid, Neda, Fadi Thabtah, and Hussein AbdelJaber. "Phishing detection: A recent intelligent machine learning comparison based on models content and features." In 2017 IEEE international conference on intelligence and security informatics (ISI), pp. 72-77. IEEE, 2017. DOI: https://doi.org/10.1109/ISI.2017.8004877
Yadollahi, Mohammad Mehdi, Farzaneh Shoeleh, Elham Serkani, Afsaneh Madani, and Hossein Gharaee. "An adaptive machine learning based approach for phishing detection using hybrid features." In 2019 5th International Conference on Web Research (ICWR), pp. 281-286. IEEE, 2019. DOI: https://doi.org/10.1109/ICWR.2019.8765265