Spam Comment Detection on YouTube Using Random Forest ML Technique

Authors

  • C. Lakshman M.C.A Student, Department of M.C.A, KMMIPS, Tirupati (D.t), Andhra Pradesh, India Author
  • S. Noortaj Assistant Professor, Department of M.C.A, KMMIPS, Tirupati (D.t), Andhra Pradesh, India Author

Keywords:

YouTube, Machine Learning, Spam Detection, RF, SVM, LSTM, ExtraTrees Classifier, Classification, Accuracy, Algorithm Comparison, Comment Filtering, Spam Comments

Abstract

This project has focused on the spam comment detection problem in YouTube comments. YouTube is gaining fame day by day as a platform for users to communicate with one another through comments. The core purpose of spam detection is to maintain the quality of the platform itself. In this study, four machine learning models-RF, SVM, LSTM, and ExtraTree-are assessed for their ability to identify spam comments. The properties of the RF might also have been equipped with great findings at the remarkable rate of 95% and were thus seen as worthy for being able to handle with very large datasets and complex patterns in general. The ExtraTrees Classifier, on the other hand, performed similarly to the SVM model, having also attained the same impressive high degree of accuracy-95%. The LSTM, on the contrary, poorly performed with an accuracy of 95%, exposing its demerit.

Downloads

Download data is not yet available.

References

Ahmad, M. A. S. Bin, Rozlan, M. I. Bin, & Yusri, A. D. Bin. (2025). Spam Detection Using Machine Learning Techniques. Authorea Preprints. https://doi.org/10.36227/TECHRXIV.173933255.51566942/V1

Alhejaili, R. (2025). Machine Learning Approaches for Sentiment Analysis on Social Media. 21–43. https://doi.org/10.1007/978-3-031-80334-5_2

Caro, J., Sgouropoulou, C., Troussas, C., Krouska, A., Kabassi, K., Mylonas, P., Asante, A., & Hajek, P. (2025). Beyond Trolling: Fine-Grained Detection of Antisocial Behavior in Social Media During the Pandemic. Information 2025, Vol. 16, Page 173, 16(3), 173. https://doi.org/10.3390/INFO16030173

Choudhary, M.; Chouhan, S. S.; Rathore, S. S. 2024. Beyond text: multimodal credibility assessment approaches for user-generated online content. DOI: 10.114514-3630347. ACM Transactions on Intelligent Systems and Technology. https://doi.org/10.1145/3673236/ASSET/D20AAA6B-A1DC-41F0-AD0B-4D4C8FA3FCE7/ASSETS/GRAPHIC/TIST-2023-10-0694-F09.JPG

Olivo, C., Santin, A. O., Viegas, E. K., Geremias, J., & Souto, E. (2025). Towards a reliable spam detection: an ensemble classification with rejection option. Cluster Computing, 28(1), 1-18. https://doi.org/10.1007/S10586-024-04742-7

Perik, L. W. (2025). Leveraging Generative Pre-trained Transformers for the Detection and Generation of Social Engineering Attacks : A Case Study on YouTube Collusion Scams.

Shahriar, M., Apu, H., Islam, S., & Taharat Aurpa, T. (2025). Explainable AI for Sentiment Analysis of Human Metapneumovirus (HMPV) Using XLNet. https://arxiv.org/abs/2502.01663v1

Downloads

Published

05-05-2025

Issue

Section

Research Articles