Utilizing Deep Learning Techniques for the Classification of Spoken Languages in India

Priyesha Patel; Ayushi Falke; Dipen Waghela; Shah Vishwa

doi:10.32628/CSEIT2390556

Authors

Priyesha Patel Computer Engineering, Parul University, Post Limda, Waghodia, Gujarat, India
Ayushi Falke Computer Engineering, Parul University, Post Limda, Waghodia, Gujarat, India
Dipen Waghela Computer Engineering, Parul University, Post Limda, Waghodia, Gujarat, India
Shah Vishwa Computer Engineering, Parul University, Post Limda, Waghodia, Gujarat, India

DOI:

https://doi.org//10.32628/CSEIT2390556

Keywords:

Speech Recognition, Indian Language, Spoken Language, Pitch, Audio Feature, Machine Learning and Deep Learning

Abstract

In Western countries, speech-recognition applications are accepted. In East Asia, it isn't as common. The complexity of the language might be one of the main reasons for this latency. Furthermore, multilingual nations such as India must be considered in order to achieve language recognition (words and phrases) utilizing speech signals. In the last decade, experts have been clamoring for more study on speech. In the initial part of the pre-processing step, a pitch and audio feature extraction technique were used, followed by a deep learning classification method, to properly identify the spoken language. Various feature extraction approaches will be discussed in this review, along with their advantages and disadvantages. Also discussed were the distinctions between various machine learning and deep learning approaches. Finally, it will point the way for future study in Indian spoken language recognition, as well as AI technology.

References

B. Paul, S. Phadikar, and S. Bera, “Identification Using Deep Learning Approach,” pp. 263–274.
H. S. Lee, Y. Tsao, S. K. Jeng, and H. M. Wang, “Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 3065–3079, 2020, doi: 10.1109/TASLP.2020.3037457.
M. A. A. Albadr and S. Tiun, “Spoken Language Identification Based on Particle Swarm Optimisation–Extreme Learning Machine Approach,” Circuits, Syst. Signal Process., vol. 39, no. 9, pp. 4596– 4622, 2020, doi: 10.1007/s00034-020-01388-9.
H. Mukherjee et al., “Deep learning for spoken language identification: Can we visualize speech signal patterns?” Neural Comput. Appl., vol. 31, no. 12, pp. 8483–8501, 2019, doi: 10.1007/s00521-019-04468-3.
S. Gholamdokht Firooz, S. Reza, and Y. Shekofteh, “Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results,” Int. J. Speech Technol., vol. 21, no. 3, pp. 649–657, 2018, doi: 10.1007/s10772-018-9526-5.
D. S. Sisodia, S. Nikhil, G. S. Kiran, and P. Sathvik, “Ensemble learners for identification of spoken languages using mel frequency cepstral coefficients,” 2nd Int. Conf. Data, Eng. Appl. IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170720.
G. Singh, S. Sharma, V. Kumar, M. Kaur, M. Baz, and M. Masud, “Spoken Language Identification Using Deep Learning,” Comput. Intell. Neurosci., vol. 2021, 2021, doi: 10.1155/2021/5123671.
H. S. Das and P. Roy, A deep dive into deep learning techniques for solving spoken language identification problems. Elsevier Inc., 2019.
N. E. Safitri, A. Zahra, and M. Adriani, “Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages,” Procedia Comput. Sci., vol. 81, no. May, pp. 182–187, 2016, doi: 10.1016/j.procs.2016.04.047.
P. Heracleous, K. Takai, K. Yasuda, Y. Mohammad, and A. Yoneyama, “Comparative study on spoken language identification based on deep learning,” Eur. Signal Process. Conf., vol. 2018- September, pp. 2265–2269, 2018, doi: 10.23919/EUSIPCO.2018.8553347.
R. Fér, P. Matějka, F. Grézl, O. Plchot, K. Veselý, and J. H. Černocký, “Multilingually trained bottleneck features in spoken language recognition,” Comput. Speech Lang., vol. 46, pp. 252–267, 2017, doi: 10.1016/j.csl.2017.06.008.
M. Dua, R. K. Aggarwal, and M. Biswas, “Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling,” Neural Comput. Appl., vol. 31, no. 10, pp. 6747–6755, 2019, doi: 10.1007/s00521-018-3499- 9.
O. Giwa and M. H. Davel, “The effect of language identification accuracy on speech recognition accuracy of proper names,” 2017 Pattern Recognit. Assoc. South Africa Robot. Mechatronics Int. Conf. PRASA-RobMech 2017, vol. 2018-January, pp. 187–192, 2017, doi: 10.1109/RoboMech.2017.8261145.
R. W. M. Ng, M. Nicolao, and T. Hain, “Unsupervised crosslingual adaptation of tokenisers for spoken language recognition,” Comput. Speech Lang., vol. 46, pp. 327–342, 2017, doi: 10.1016/j.csl.2017.05.002.
M. A. A. Albadr, S. Tiun, M. Ayob, and F. T. AL-Dhief, “Spoken language identification based on optimised genetic algorithm–extreme learning machine approach,” Int. J. Speech Technol., vol. 22, no. 3, pp. 711–727, 2019, doi: 10.1007/s10772-019-09621-w.
Y. Ma, R. Xiao, and H. T. B, “An Event-Driven Computational System,” vol. 1, pp. 453–461, 2017, doi: 10.1007/978-3-319-70136-3.
P. Beckmann, M. Kegler, H. Saltini, and M. Cernak, “Speech-VGG: A deep feature extractor for speech processing,” no. May 2020, 2019, [Online]. Available: http://arxiv.org/abs/1910.09909.
Dhawale, Apurva D., Sonali B. Kulkarni, and Vaishali M. Kumbhakarna. "A Survey of Distinctive Prominence of Automatic Text Summarization Techniques Using Natural Language Processing." In International Conference on Mobile Computing and Sustainable Informatics, pp. 543-549. Springer, Cham, 2020

Utilizing Deep Learning Techniques for the Classification of Spoken Languages in India

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite