An Intense Study of Machine Learning Research Approach to Identify Toxic Comments

Authors

  • Monika Dandotiya  Department of Computer Science, ITM University, Gwalior, Madhya Pradesh, India
  • Dr. Rajni Ranjan Singh Makwana  Department of Computer Science, MITS Gwalior, Madhya Pradesh, India
  • Nidhi Dandotiya  Department of Computer Science, ITM University, Gwalior, Madhya Pradesh, India

DOI:

https://doi.org//10.32628/CSEIT228391

Keywords:

Machine Learning Toxic Comments, LSTM, GRU, RNN, BiLSTM.

Abstract

A large number of online public domain comments are usually constructive, but a significant proportion is toxic. The comments include several errors that allow the machine-learning algorithm to train the data set by processing dataset with numerous variety of tasks, in the method of conversion of raw comments previously feeding it to Classification models using a ML method. In this study, we have proposed classification of toxic comments using a ML approach on a multilinguistic toxic comment dataset. The logistic regression method is applied to classify processed dataset, which will distinguish toxic comments from non-toxic comments. The multi-headed model comprises toxicity (obscene, insult, severe toxic, threat, & identity-hate) or Nontoxicity Estimation. We have implemented four models (LSTM, GRU RNN, and BiLSTM) and detected the toxic comments. In Python 3, all models have a simple structure that can adapt to the resolution of other tasks. The classification problem resolution findings are presented with the aid of the proposed models. It has been concluded that all models solve the challenge effectively, but the BiLSTM is the most effective to ensure the best practicable accuracy.

References

  1. Almerekhi, H., Jansen, B. J., Kwak, H., & Salminen, J. (2019). Detecting toxicity triggers in online discussions. HT 2019 - Proceedings of the 30th ACM Conference on Hypertext and Social Media. https://doi.org/10.1145/3342220.3344933
  2. Berk, E., & Filatova, E. (2019). Incendiary News Detection Enis. Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference.
  3. McCallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. AAAI/ICML-98 Workshop on Learning for Text Categorization. https://doi.org/10.1.1.46.1529
  4. Serban, I. V., Sankar, C., Germain, M., Zhang, S., Lin, Z., Subramanian, S., Kim, T., Pieper, M., Chandar, S., Ke, N. R., Rajeshwar, S., de Brebisson, A., Sotelo, J. M. R., Suhubdy, D., Michalski, V., Nguyen, A., Pineau, J., & Bengio, Y. (2017). A deep reinforcement learning chatbot. In arXiv.
  5. Liu, P., Qiu, X., & Xuanjing, H. (2016). Recurrent neural network for text classification with multi-task learning. IJCAI International Joint Conference on Artificial Intelligence.
  6. Ramos, J. (2003). Using TF-IDF to Determine Word Relevance in Document Queries. Proceedings of the First Instructional Conference on Machine Learning.
  7. McCallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. AAAI/ICML-98 Workshop on Learning for Text Categorization. https://doi.org/10.1.1.46.1529
  8. The amazing power of word vectors. (2018).
  9. Rahul, Kajla, H., Hooda, J., & Saini, G. (2020). Classification of Online Toxic Comments Using Machine Learning Algorithms. 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 1119–1123. https://doi.org/10.1109/ICICCS48265.2020.9120939
  10. Rahul, Kajla, H., Hooda, J., & Saini, G. (2020). Classification of Online Toxic Comments Using Machine Learning Algorithms. 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 1119–1123. https://doi.org/10.1109/ICICCS48265.2020.9120939
  11. Mestry, S., Singh, H., Chauhan, R., Bisht, V., & Tiwari, K. (2019). Automation in Social Networking Comments with the Help of Robust fastText and CNN. Proceedings of 1st International Conference on Innovations in Information and Communication Technology, ICIICT 2019. https://doi.org/10.1109/ICIICT1.2019.8741503
  12. Shang, L., Zhang, D. Y., Wang, M., & Wang, D. (2019). VulnerCheck: A Content-Agnostic Detector for Online Hatred-Vulnerable Videos. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. https://doi.org/10.1109/BigData47090.2019.9006329
  13. Ibrahim, M., Torki, M., & El-Makky, N. (2019). Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning. Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018. https://doi.org/10.1109/ICMLA.2018.00141
  14. Chandra, N., Khatri, S. K., & Som, S. (2018). Anti social comment classification based on kNN algorithm. 2017 6th International Conference on Reliability, Infocom Technologies and Optimization: Trends and Future Directions, ICRITO 2017. https://doi.org/10.1109/ICRITO.2017.8342450
  15. Takeda, M., Kobayashi, N., Kitagawa, F., & Shiina, H. (2016). Classification of comments by tree kernels using the hierarchy of wikipedia for tree structures. Proceedings - 2016 5th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2016. https://doi.org/10.1109/IIAI-AAI.2016.62

Downloads

Published

2022-08-30

Issue

Section

Research Articles

How to Cite

[1]
Monika Dandotiya, Dr. Rajni Ranjan Singh Makwana, Nidhi Dandotiya, " An Intense Study of Machine Learning Research Approach to Identify Toxic Comments, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 4, pp.71-81, July-August-2022. Available at doi : https://doi.org/10.32628/CSEIT228391