An Efficient Cross-Lingual BERT Model for Text Classification and Named Entity Extraction in Multilingual Dataset

Asoke Nath; Debapriya Kandar; Rahul Gupta

doi:10.32628/CSEIT217353

Authors

Asoke Nath Department of Computer Science, St. Xavier's College (Autonomous), Kolkata, West Bengal, India
Debapriya Kandar Department of Computer Science, St. Xavier's College (Autonomous), Kolkata, West Bengal, India
Rahul Gupta Department of Computer Science, St. Xavier's College (Autonomous), Kolkata, West Bengal, India

DOI:

https://doi.org/10.32628/CSEIT217353

Keywords:

Natural Language Processing, BERT, Transformers, Multilingual NER.

Abstract

In recent times, with the rise of the internet, everyone is being bombarded with tons of information and data from various sources like websites, blogs and articles, social media posts and comments, e-news portals etc. Now all these data are mostly unstructured. In this paper, the authors have tried to explore the efficiency of the cross-lingual BERT model i.e. M-BERT for text classification and named entity extraction on multilingual data. The authors have used datasets of three different languages namely: French, German and Portuguese to evaluate the model performance.

References

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Google AI Language, 24 May, 2019, arXiv:1810.04805.
Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum, “Linguistically-Informed Self-Attention for Semantic Role Labeling”, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5027–5038, , October 31 - November 4, 2018.
Hang Yan, Bocao Deng, Xiaonan Li, XipengQiu, “TENER: Adapting Transformer Encoder for Named Entity Recognition”, 2019, arXiv:1911.04474.
Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li, "A Survey on Deep Learning for Named Entity Recognition", IEEE Transactions on Knowledge and Data Engineering”, 2020
Andrea Galassi, Marco Lippi, Paolo Torroni, “Attention in Natural Language Processing”, IEEE Transactions on Neural Networks and Learning Systems, 20 Aug, 2020, arXiv:1902.02181.
Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, IlliaPolosukhin, “Attention Is All You Need”, 31st Conference on Neural Information Processing Systems, 2017, arXiv:1706.03762.
Zhiheng Huang, Wei Xu, Kai Yu, "Bidirectional LSTM-CRF Models for Sequence Tagging", 2015, arXiv:1508.01991.
Guillaume Lample, Alexis Conneau, "Cross-lingual Language Model Pretraining", 2019, arXiv:1901.07291

An Efficient Cross-Lingual BERT Model for Text Classification and Named Entity Extraction in Multilingual Dataset

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite