Speech Recognition System for Different Kannada Dialects

Authors

  • Hemakumar G  Department of Computer Science, Govt. College for Women (Autonomous), Mandya, Karnataka, India
  • Punithavalli M  Department of Computer Application, Bharathiar University, Coimbatore, Tamil Nadu, India
  • Thippeswamy K   Department of Computer Science & Eng., Visvesvaraya Technology University, Post-Graduation Centre, Mysore, Karnataka, India.

Keywords:

Speech recognition, Kannada dialect, Language model, Normal parameters and Speech enhancement

Abstract

In this paper discuss on pronunciation variations occurs in different Kannada dialects, designing language model, building acoustic models, finally recognition of Kannada dialect speech. Algorithm designed for recognition of isolated Kannada word and continuous Kannada speech made by different dialects speakers. The novelty of algorithm is in handling multiple Kannada dialects speaker’s speech recorded by mini-microphone, headphone and cell phones. Robustness of the algorithm in handling different Kannada dialects speech and handling little noisy waves. Here speech waves recorded at natural environment. Here classification of speech models based on speaker’s dialects and inside the dialects sub classes designed according to acoustic features. During recognition, breadth first matching technique and then inside that dialect class depth first matching techniques implemented. Here speech recognition designed using MFCC and coefficients of real cepstrum features and compared the performance. In these experiment real cepstrum coefficients, features produced better recognition rate while dealing with multiple dialects of same language. All computations made using mat lab.

References

  1. Benzeguiba M et al., "Automatic Speech Recognition and Intrinsic Speech Variation", Proc of IEEE ICASSP, pp. 1021-1024, 2006.
  2. Katarina Bartkov A, "On using units trained on foreign data for improved multiple accent speech recognition", Speech Communication, Vol. 49, Issues 10–11, pp. 836-846, Oct–Nov 2007.
  3. Umesh S, "Studies on inter-speaker variability in speech and its application in automatic speech recognition", Indian Academy of Sciences, Sadhana Vol. 36, Part 5, pp. 853–883, October 2011.
  4. Yun Lei and John H. L. Hansen, "Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese", IEEE Transactions On Audio, Speech, And Language Processing, Vol. 19, No. 1, pp. 85-96, Jan 2011.
  5. Petr Cerva et al., "Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives", Speech Communication, vol 55, Issue 10, pp. 033-046, Nov–Dec 2013.
  6. Sabato Marco Siniscalchi et al, "Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems", IEEE Transactions on Audio, Speech, And Language Processing, Vol. 21, No. 10, pp. 2151-2161, October 2013.
  7. Martin Krawczyk and Timo Gerkmann, "STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement", IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 12, pp. 1931-1940, Dec 2014.
  8. Jesper Rindom Jensen et al., "A Class of Optimal Rectangular Filtering Matrices for Single-Channel Signal Enhancement in the Time Domain", IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 12, pp. 2595-2606, Dec 2013.
  9. Matthew McCallum et al., "Stochastic-Deterministic MMSE STFT Speech Enhancement with General A Priori Information", IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 7, pp. 1445-1457, July 2013.
  10. X. Lu et al., "Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition", Speech Communication 52 (2010) page no. 1–11.
  11. Tomohiro Nakatani et al., "Dominance Based Integration of Spatial and Spectral features for Speech Enhancement", IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 12, pp. 2516-2531, Dec 2013.
  12. Yi Hu and Philipos C. Loizou, "Evaluation of Objective Quality Measures for Speech Enhancement", IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 1, pp. 229-238, Jan 2008.
  13. Li Deng (1998), "A dynamic feature-based approach to the interface between phonology and phonetics for speech modeling and recognition", Speech Communication 24, pp. 299-323, 1998.
  14. Mijit Ablimit, Tatsuya Kawahara, Askar Hamdulla, "Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language", Speech Communication, vol 60, pp. 78-87, May 2014.
  15. Bright W (2014), "Social Dialect and Language History". Available https://is.muni.cz/el/1421/jaro2014/AJ22089/Social_Dialect_and_Language_History.pdf
  16. Jennifer Torres et al., "Kannada Manual: Language and Culture", Texas State University class of 2012.
  17. Nag, S., "Early reading in Kannada the pace of acquisition of orthographic knowledge and phonemic awareness", Journal of Research in Reading, 30(1), pp. 7-22, 2007.
  18. Hemakumar G. and Punitha P., "Speech Recognition Technology: A Survey on Indian Languages", International Journal of Information Science and Intelligent System, Vol. 2, No.4, 2013, pp. 1-38.
  19. Robert Rozman and Dusan M. Kodek, "Using asymmetric windows in automatic speech recognition", Speech Communication 49, pp. 268–276, 2007.
  20. Hemakumar G. and Punitha P., "Automatic Segmentation of Kannada Speech Signal into Syllables and Sub-words: Noised and Noiseless Signals", International Journal of Scientific & Engineering Research, Volume 5, Issue 1, pp. 1707-1711, Jan-2014.
  21. Jingdong Chen et al (2003), "Cepstrum derived from differentiated power spectrum for robust speech recognition", Speech Communication 41 (2003), page no 469–484.
  22. Hemakumar G. and Punitha P., "Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent", Proc of INDIA-2015, Published by Springer- Advances in Intelligent and Soft Computing, Vol. 339, pp 73-80.
  23. Lalit R.Bahl, et.al, "Estimating Hidden Markov Model Parameters so as to maximize speech recognition Accuracy", IEEE Transactions on Audio, Speech and Language processing vol.1, no.1, 1993.
  24. Satya Dharanipragada, et.al, "Gaussian mixture models with covariance’s or Precisions in shared multiple subspaces", IEEE Transactions on Audio, Speech and Language Processing, vol.14, no.4, 2006.
  25. Thangarajan  R., Natarajan A. M. and Selvam M. "Syllable modeling in continuous speech recognition for Tamil language", International Journal for Speech Technology, vol. 12, pp.47 -57, 2009.

Downloads

Published

2017-10-31

Issue

Section

Research Articles

How to Cite

[1]
Hemakumar G, Punithavalli M, Thippeswamy K, " Speech Recognition System for Different Kannada Dialects , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.180-188, September-October-2017.