An Intense Study of Machine Learning Research Approach to Identify Toxic Comments

Monika Dandotiya; Dr. Rajni Ranjan Singh Makwana; Nidhi Dandotiya

doi:10.32628/CSEIT228391

Authors

Monika Dandotiya Department of Computer Science, ITM University, Gwalior, Madhya Pradesh, India
Dr. Rajni Ranjan Singh Makwana Department of Computer Science, MITS Gwalior, Madhya Pradesh, India
Nidhi Dandotiya Department of Computer Science, ITM University, Gwalior, Madhya Pradesh, India

DOI:

https://doi.org/10.32628/CSEIT228391

Keywords:

Machine Learning Toxic Comments, LSTM, GRU, RNN, BiLSTM.

Abstract

A large number of online public domain comments are usually constructive, but a significant proportion is toxic. The comments include several errors that allow the machine-learning algorithm to train the data set by processing dataset with numerous variety of tasks, in the method of conversion of raw comments previously feeding it to Classification models using a ML method. In this study, we have proposed classification of toxic comments using a ML approach on a multilinguistic toxic comment dataset. The logistic regression method is applied to classify processed dataset, which will distinguish toxic comments from non-toxic comments. The multi-headed model comprises toxicity (obscene, insult, severe toxic, threat, & identity-hate) or Nontoxicity Estimation. We have implemented four models (LSTM, GRU RNN, and BiLSTM) and detected the toxic comments. In Python 3, all models have a simple structure that can adapt to the resolution of other tasks. The classification problem resolution findings are presented with the aid of the proposed models. It has been concluded that all models solve the challenge effectively, but the BiLSTM is the most effective to ensure the best practicable accuracy.

References

Almerekhi, H., Jansen, B. J., Kwak, H., & Salminen, J. (2019). Detecting toxicity triggers in online discussions. HT 2019 - Proceedings of the 30th ACM Conference on Hypertext and Social Media. https://doi.org/10.1145/3342220.3344933
Berk, E., & Filatova, E. (2019). Incendiary News Detection Enis. Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference.
McCallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. AAAI/ICML-98 Workshop on Learning for Text Categorization. https://doi.org/10.1.1.46.1529
Serban, I. V., Sankar, C., Germain, M., Zhang, S., Lin, Z., Subramanian, S., Kim, T., Pieper, M., Chandar, S., Ke, N. R., Rajeshwar, S., de Brebisson, A., Sotelo, J. M. R., Suhubdy, D., Michalski, V., Nguyen, A., Pineau, J., & Bengio, Y. (2017). A deep reinforcement learning chatbot. In arXiv.
Liu, P., Qiu, X., & Xuanjing, H. (2016). Recurrent neural network for text classification with multi-task learning. IJCAI International Joint Conference on Artificial Intelligence.
Ramos, J. (2003). Using TF-IDF to Determine Word Relevance in Document Queries. Proceedings of the First Instructional Conference on Machine Learning.
McCallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. AAAI/ICML-98 Workshop on Learning for Text Categorization. https://doi.org/10.1.1.46.1529
The amazing power of word vectors. (2018).
Rahul, Kajla, H., Hooda, J., & Saini, G. (2020). Classification of Online Toxic Comments Using Machine Learning Algorithms. 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 1119–1123. https://doi.org/10.1109/ICICCS48265.2020.9120939
Rahul, Kajla, H., Hooda, J., & Saini, G. (2020). Classification of Online Toxic Comments Using Machine Learning Algorithms. 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 1119–1123. https://doi.org/10.1109/ICICCS48265.2020.9120939
Mestry, S., Singh, H., Chauhan, R., Bisht, V., & Tiwari, K. (2019). Automation in Social Networking Comments with the Help of Robust fastText and CNN. Proceedings of 1st International Conference on Innovations in Information and Communication Technology, ICIICT 2019. https://doi.org/10.1109/ICIICT1.2019.8741503
Shang, L., Zhang, D. Y., Wang, M., & Wang, D. (2019). VulnerCheck: A Content-Agnostic Detector for Online Hatred-Vulnerable Videos. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. https://doi.org/10.1109/BigData47090.2019.9006329
Ibrahim, M., Torki, M., & El-Makky, N. (2019). Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning. Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018. https://doi.org/10.1109/ICMLA.2018.00141
Chandra, N., Khatri, S. K., & Som, S. (2018). Anti social comment classification based on kNN algorithm. 2017 6th International Conference on Reliability, Infocom Technologies and Optimization: Trends and Future Directions, ICRITO 2017. https://doi.org/10.1109/ICRITO.2017.8342450
Takeda, M., Kobayashi, N., Kitagawa, F., & Shiina, H. (2016). Classification of comments by tree kernels using the hierarchy of wikipedia for tree structures. Proceedings - 2016 5th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2016. https://doi.org/10.1109/IIAI-AAI.2016.62

An Intense Study of Machine Learning Research Approach to Identify Toxic Comments

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite