A Survey on Multilingual Text Conversion and Speech Generation Workflow
DOI:
https://doi.org/10.32628/CSEIT25112826Keywords:
Multilingual, OCR, Summarization, Translation, TTSAbstract
Human communication is based on language, which enables cross-cultural sharing of concepts, feelings, and information. By enabling people to comprehend and communicate in various languages, multilingualism improves communication. However, because of the variety of languages, scripts, and linguistic systems, there are still major obstacles to smooth multilingual communication. In order to overcome these obstacles, technologies like text summarization, machine translation, optical character recognition (OCR), and text-to-speech (TTS) have become essential. Accuracy problems with different scripts and handwriting styles still exist even if OCR enables text extraction from digital or physical materials. Despite the tremendous advancements in translation systems, issues with context preservation and low-resource languages still exist. Summarization approaches often fail to maintain coherence in multilingual environments, despite their intended reduction of information overload. TTS systems provide accessibility, especially for visually impaired users, but they must account for linguistic nuances and differences in pronunciation between languages. This study examines the state-of-the-art techniques, models, and frameworks developed to overcome these linguistic and technological barriers in the field of multilingual text processing, while also emphasizing important research contributions for more intelligent and inclusive systems.
Downloads
References
Inaguma, Hirofumi, et al. "End-to-End Multilingual Translation for Diverse Applications." IEEE Transactions on Audio, Speech, and Language Processing, vol. 28, 2020, pp. 1501-1513.
Helsinki NLP Team. "Helsinki-NLP Applications in Machine Translation." Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2021, pp. 2345-2356.
Biró, Attila, et al. "Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools." Journal of Machine Learning and Computer Vision, vol. 23, no. 4, 2023, pp. 435-450.
Alcantara, T. H. M., et al. "Multilingual Text Summarization for German Texts Using Transformer Models." Journal of Language Processing and Machine Learning, vol. 15, no. 2, 2023, pp. 340-355.
Jaided AI. "EasyOCR for Robust Multilingual Text Recognition." Journal of Computer Vision and Image Processing, vol. 12, no. 4, 2021, pp. 210-225.
Bamotra, Abhishek, and Phani Krishna Uppala. "TransDocs: Optical Character Recognition with Word to Word Translation." International Journal of Computer Applications, vol. 180, no. 22, 2023, pp. 1-8.
Chinmayeeswari, Garikipati, et al. "Optical Character Recognition, Translation and Speech Generation." Journal of Applied Artificial Intelligence, vol. 15, no. 4, 2022, pp. 170-185.
Sriharsha, A. V., et al. "Efficient Text Extraction and Summarization using EasyOCR and GPT-3." International Journal of Computational Intelligence and Applications, vol. 20, no. 3, 2023, pp. 320-335.
Riggs, Hugo, and Mohammad Khan. "Efficient Text Summarization using BERT and T5 Models." Journal of Natural Language Processing, vol. 18, no. 3, 2022, pp. 45-60.
Kharisma, Ivana Lucia, et al. "Integration of Transformer Model Text Summarization and Text-to-Speech in Helping Document Understanding in the Bukudio Application." Journal of Information Technology and Applications, vol. 28, no. 1, 2023, pp. 55-70.
Sastry, H. V. S. S. K., et al. "HNTSumm: Hybrid Text Summarization of Transliterated News Articles." International Journal of Artificial Intelligence and Data Science, vol. 25, no. 2, 2023, pp. 310-325.
Vinnarasu, A., and D. V. Jose. "Speech to Text Conversion and Summarization for Effective Understanding and Documentation." Journal of Data Science and Technology, vol. 12, no. 4, 2019, pp. 205-220.
Dewi, K. E., and N. I. Widiastuti. "Automatic Summarization of Indonesian Texts Using a Hybrid Approach." Journal of Computational Linguistics and Applications, vol. 14, no. 1, 2020, pp. 70-85.
Helsinki NLP Team. "Helsinki-NLP Applications in Machine Translation." Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2021, pp. 2345-2356.
Chinmayeeswari, Garikipati, et al. "Optical Character Recognition, Translation and Speech Generation." Journal of Applied Artificial Intelligence, vol. 15, no. 4, 2022, pp. 170-185.
Bamotra, Abhishek, and Phani Krishna Uppala. "TransDocs: Optical Character Recognition with Word to Word Translation." International Journal of Computer Applications, vol. 180, no. 22, 2023, pp. 1-8.
Debnath, Ankur, et al. "Improving Speech Synthesis for Indian Languages Using Tacotron Variants." International Journal of Speech Technology, vol. 15, no. 2, 2021, pp. 85-97.
Shastri, Swaroopa, and Shashank Vishwakarma. "An Efficient Approach for Text-to-Speech Conversion Using Machine Learning and Image Processing Techniques." International Journal of Artificial Intelligence and Applications, vol. 14, no. 2, 2023, pp. 120-134.
Thu Thu, Chaw Su, and Theingi Zin. "Implementation of Text to Speech Conversion." Journal of Engineering and Technology, vol. 9, no. 1, 2014, pp. 45-52.
Prashantha, H. S., et al. "Image Text to Speech Conversion in Desired Language." International Journal of Computer Science and Applications, vol. 18, no. 2, 2023, pp. 112-128.
Reddy, Shreyas, Rashmi Ranjan Das, and Anjali Mohapatra. "An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion." Journal of Computer Science and Technology, vol. 35, no. 5, 2023, pp. 789-805.
Chowdary, Anne Dheeraj, et al. "Conversion of Text Image to Audio for Visually Impaired." International Journal of Assistive Technologies, vol. 12, no. 3, 2023, pp. 95-108.
Arrizqi, N., et al. "Implementation of Google Text to Speech on Android-Based Money Detection Application." International Journal of Mobile Computing and Applications, vol. 29, no. 3, 2021, pp. 180-195.
Anil Kumar, K. K., et al. "Efficient Human-Quality Kannada TTS using Transfer Learning on NVIDIA's Tacotron2." Journal of Speech Synthesis and Processing, vol. 22, no. 4, 2022, pp. 410-425.
Alaei, Alireza, P. Nagabhushan, and Umapada Pal. "A Benchmark Kannada Handwritten Document Dataset and Its Segmentation." Pattern Recognition Letters, vol. 33, no. 12, 2012, pp. 1519-1529.
Revanth, S., and Roopa H. M. "Handwritten Character Recognition." SSRN Electronic Journal, 2023.
"Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)." ResearchGate, 2020.
"A Survey of Text Summarization Extractive Techniques." ResearchGate, 2012.
Kryściński, Wojciech, et al. "Abstractive Text Summarization: Enhancing Sequence-to-Sequence Models with Semantic Representations." Computational Linguistics, vol. 47, no. 4, 2021, pp. 813-861.
Khan, Rubeena A., and J. S. Chitode. "Concatenative Speech Synthesis: A Review." International Journal of Computer Applications, vol. 136, no. 3, Feb. 2016, pp.1-6.
Campbell, Nick. "CHATR the Corpus; a 20-year-old Archive of Concatenative Speech Synthesis." Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), 2016.
Zen, Heiga, Keiichi Tokuda, and Alan W. Black. "Statistical Parametric Speech Synthesis." Speech Communication, vol. 51, no. 11, 2009, pp. 1039-1064.
Zen, Heiga, Andrew Senior, and Mike Schuster. "Statistical Parametric Speech Synthesis Using Deep Neural Networks." Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 7962-7966.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.