A Review on Technologies for Group-Aware Malayalam Conversational AI

Authors

  • Husaima Mailanchy T K Department of Computer Science and Engineering, Universal Engineering College, Thrissur, Kerala, India Author
  • Suha Narghees A S Department of Computer Science and Engineering, Universal Engineering College, Thrissur, Kerala, India Author
  • Taniya T S Department of Computer Science and Engineering, Universal Engineering College, Thrissur, Kerala, India Author
  • Vishnu M S Department of Computer Science and Engineering, Universal Engineering College, Thrissur, Kerala, India Author
  • Dr. L.C. Manikandan Department of Computer Science and Engineering, Universal Engineering College, Thrissur, Kerala, India Author

DOI:

https://doi.org/10.32628/CSEIT2511125

Keywords:

Conversational AI, Voice Separation, Speech-to-Text, Text-to-Speech, Malayalam-English, Multimodal Processing, Natural Language Processing, Bilingual Communication

Abstract

This review paper explores the foundational technologies required to develop a group-aware Conversational AI for the Malayalam-English bilingual community. The objective of the project is to create an AI system capable of interacting naturally in group settings, dynamically recognizing and responding to multiple speakers in real-time. The key components of this system include voice separation, which isolates individual speakers’ voices in noisy environments, speech-to-text (STT), which accurately transcribes Malayalam speech that may contain English phrases, and text-to-speech (TTS), which synthesizes natural-sounding speech in Malayalam-English conversational patterns. This review covers recent advancements in each of these three areas by evaluating three core papers for each technology. Through this review, we aim to understand the current capabilities of these technologies and how they can be applied to build an accessible, scalable Conversational AI that bridges the language gap for Kerala’s linguistically diverse population.

Downloads

Download data is not yet available.

References

Neri, Julian, and Sebastian Braun. ”Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments.” In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. DOI: https://doi.org/10.1109/ICASSP49357.2023.10096131

Subakan, Cem, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, and Jianyuan Zhong. ”Attention is all you need in speech separation.”In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 21-25. IEEE, 2021. DOI: https://doi.org/10.1109/ICASSP39728.2021.9413901

Luo, Yi, Zhuo Chen, and Takuya Yoshioka. ”Dual-path rnn:efficient long sequence modeling for time-domain single-channel speech separation.” In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 46-50. IEEE, 2020. DOI: https://doi.org/10.1109/ICASSP40776.2020.9054266

Baevski, Alexei, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. ”wav2vec 2.0: A framework for self-supervised learning of speech representations.” Advances in neural information processing systems 33 (2020): 12449-12460.

Seethala, S. C. (2017). Revolutionizing Data Warehouses in Manufacturing: Big Data-Infused Automation for ETL and Beyond. https://doi.org/10.5281/zenodo.14169254

Seethala, S. C. (2018). AI and Big Data: Transforming Financial Data Warehousing for Predictive Analytics. https://zenodo.org/record/14050624

Seethala, S. C. (2018). Leveraging AI in Cloud Data Warehouses for Manufacturing: A Future-Proof Approach. https://doi.org/10.5281/zenodo.14059537

Seethala, S. C. (2019). Data Warehouse Modernization with AI: A Strategic Path for the Retail Industry. https://doi.org/10.5281/zenodo.14168854

Seethala, S. C. (2019). AI-Enhanced ETL for Modernizing Data Warehouses in Insurance and Risk Management. https://doi.org/10.5281/zenodo.14059551

Seethala, S. C. (2019). Scaling Financial Data Warehouses with AI: Towards a Future-Proof Cloud-Based Ecosystem. International Journal of Scientific Research & Engineering Trends, 5(6). https://doi.org/10.61137/ijsret.vol.5.issue6.575 DOI: https://doi.org/10.61137/ijsret.vol.5.issue6.575

Zhang, Yu, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhe-huai Chen, Nanxin Chen et al. ”Google usm: Scaling automatic speech recognition beyond 100 languages.” arXiv preprint arXiv:2303.01037 (2023).

Radford, Alec, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. ”Robust speech recognition via large-scale weak supervision.” In International conference on machine learning, pp. 28492-28518. PMLR, 2023.

Shen, Jonathan, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen et al. ”Natural tts synthesis by conditioning wavenet on mel spectrogram predictions.” In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4779-4783. IEEE, 2018. DOI: https://doi.org/10.1109/ICASSP.2018.8461368

Kim, Jaehyeon, Sungwon Kim, Jungil Kong, and Sungroh Yoon. ”Glow- tts: A generative flow for text-to-speech via monotonic alignment search.” Advances in Neural Information Processing Systems 33 (2020): 8067-8077.

Ren, Yi, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. ”Fastspeech 2: Fast and high-quality end-to-end text to speech.” arXiv preprint arXiv:2006.04558 (2020).

Downloads

Published

10-02-2025

Issue

Section

Research Articles

Share

Similar Articles

1-10 of 571

You may also start an advanced similarity search for this article.