An Efficient Cross-Lingual BERT Model for Text Classification and Named Entity Extraction in Multilingual Dataset

Authors

  • Asoke Nath  Department of Computer Science, St. Xavier's College (Autonomous), Kolkata, West Bengal, India
  • Debapriya Kandar  Department of Computer Science, St. Xavier's College (Autonomous), Kolkata, West Bengal, India
  • Rahul Gupta  Department of Computer Science, St. Xavier's College (Autonomous), Kolkata, West Bengal, India

DOI:

https://doi.org//10.32628/CSEIT217353

Keywords:

Natural Language Processing, BERT, Transformers, Multilingual NER.

Abstract

In recent times, with the rise of the internet, everyone is being bombarded with tons of information and data from various sources like websites, blogs and articles, social media posts and comments, e-news portals etc. Now all these data are mostly unstructured. In this paper, the authors have tried to explore the efficiency of the cross-lingual BERT model i.e. M-BERT for text classification and named entity extraction on multilingual data. The authors have used datasets of three different languages namely: French, German and Portuguese to evaluate the model performance.

References

  1. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Google AI Language, 24 May, 2019, arXiv:1810.04805.
  2. Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum, “Linguistically-Informed Self-Attention for Semantic Role Labeling”, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5027–5038, , October 31 - November 4, 2018.
  3. Hang Yan, Bocao Deng, Xiaonan Li, XipengQiu, “TENER: Adapting Transformer Encoder for Named Entity Recognition”, 2019, arXiv:1911.04474.
  4. Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li, "A Survey on Deep Learning for Named Entity Recognition", IEEE Transactions on Knowledge and Data Engineering”, 2020
  5. Andrea Galassi, Marco Lippi, Paolo Torroni, “Attention in Natural Language Processing”, IEEE Transactions on Neural Networks and Learning Systems, 20 Aug, 2020, arXiv:1902.02181.
  6. Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, Ɓukasz Kaiser, IlliaPolosukhin, “Attention Is All You Need”, 31st Conference on Neural Information Processing Systems, 2017, arXiv:1706.03762.
  7. Zhiheng Huang, Wei Xu, Kai Yu, "Bidirectional LSTM-CRF Models for Sequence Tagging", 2015, arXiv:1508.01991.
  8. Guillaume Lample, Alexis Conneau, "Cross-lingual Language Model Pretraining", 2019, arXiv:1901.07291

Downloads

Published

2021-06-30

Issue

Section

Research Articles

How to Cite

[1]
Asoke Nath, Debapriya Kandar, Rahul Gupta, " An Efficient Cross-Lingual BERT Model for Text Classification and Named Entity Extraction in Multilingual Dataset, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 7, Issue 3, pp.280-286, May-June-2021. Available at doi : https://doi.org/10.32628/CSEIT217353