A Review on Automatic Speech Recognition
Keywords:
Speech, Speech Recognition, Human Machine Interaction, CommunicationAbstract
With the advancement of speech recognition technologies, there is an increase in the adoption of voice interfaces on mobile-based platforms. While, developing a general purpose Automatic Speech Recognition (ASR) which can understand voice commands is important, the contexts of how people interact with their mobile device change very rapidly. Due to the high processing complexity of the ASR engine, much of the processing of trending data is being carried out on cloud platforms. Changed content regarding news, music, movies and TV series change the focus of interaction with voice based interfaces. Hence ASR engines trained on a static vocabulary may not be able to adapt to the changing contexts. The focus of this paper is to first describe the problems faced in incorporating dynamically changing vocabulary and contexts into an ASR engine. We then propose a novel solution which shows a relative improvement of 38 percent utterance accuracy on newly added content without compromising on the overall accuracy and stability of the system.
References
- Jianliang Meng, Junwei Zhang and Haoquan Zhao, “Overview of the Speech Recognition Technology”, 2012 Fourth International Conference on Computational and Information Sciences, 978-0- 7695-4789-3/12$26.00©2012 IEEE.
- Andress S. Spanias, Frank H. Wu, “Speech Coding and Speech Recognition Technologies: A Review”, CH3006-4/91/0000-0572$1.000 IEEE.
- Jeff Zadeh, “Technology of speech for a computer system”, DECEMBER 2003/JANUARY 2004, 0278- 6648/03/$17.00 © 2003 IEEE.
- E Chandra and C. “A review on Speech and Speaker Authentication System using Voice Signal feature selection and Extraction”, 2009 IEEE International Advance Computing Conference (IACC 2009) Patiala, India, 6-7 March 2009.
- Santosh K.Gaikwad, Bharti W.Gwali and Pravin Yannawar, “A Review on Speech Recognition Technique”, International Journal of Computer Applications (0975 – 8887) Volume 10– No.3, November 2010.
- Lawrence R. Rabiner, “Applications of speech recognition in the area of telecommunication”, 0- 7803-3698-4/97/$10.00 0 1997 IEEE.
- Tingyao Wu, D. Van Compernolle, H. Van hamme, “Feature Selection in Speech and Speaker Recognition” June 2009. U.D.C. 681.3_I27. Phd Thesis.
- Urmila Shrawankar, Vilas Thakar, “Techniques for Feature Extraction in Speech Recognition System : A Comparative Study”.
- Chris Biemann, Dirk Schnelle-Walka, “Unsupervised acquisition of acoustic models for speech-to-text alignment”, Master-Thesis von Benjamin Milde 10. April 2014.
- Maxim Khalilov, J. Adri´an Rodr´iguez Fonollosa, “New Statistical And Syntactic Models For Machine Translation”, TALP Research Center, Speech Processing Group, Barcelona, October 2009.
- Richard D. Peacocke, Daryl H. Graf, “An Introduction to Speech and Speaker Recognition”, Bell-Northern Research,IEEE August 1990.
- Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”, Digital Object Identifier 10.1109/ MSP.2012.2205597,Date of publication: 15 October 2012.
- Lin-shan Lee and Yi-cheng Pan, “Voice-based Information Retrieval How far are we from the text- based information retrieval?”, IEEE ASRU 2009.
- ] Masanobu Fujioka, Seiicbi Yamamoto, Naomi lnoue, Makoto Nakamura and Takashi Mukasa, “Experience and Evolution of Voice recognition applications for telecommunicati0ns services” 0- 7803-4984-9/98/$10.00 0 1998 IEEE.
- Joseph Picone, “Continuous Speech Recognition Using Hidden Markov Models”, IEEE ASSP MAGAZINE JULY 1990.
- Todd A. Stephenson, Mathew Magimai Doss and Hervé Bourlard,“Speech Recognition with Auxiliary Information”, IEEE transactions on speech and audio processing, vol. 12, no. 3, May 2004.
- Nihat Öztürk and Ulvi Ünözkan, “Microprocessor Based Voice Recognition System Realization”, 978- 1-4244-6904-8/10/$26.00 ©2010 IEEE.
- José Leonardo Plaza-Aguilar, David Báez-López, Luis Guerrero-Ojeda and Jorge Rodríguez Asomoza, “A Voice Recognition System for Speech Impaired People”, Proceedings of the 14th International Conference on Electronics, Communications and Computers (CONIELECOMP’04) 0-7695-2074- X/04 $ 20.00 © 2004 IEEE.
- Olli Viikki, David Bye and Kari Laurila, “A Recursive Feature Vector Normalization Approach for Robust Speech Recognition in Noise”, 0-7803- 4428-6/98 $70.08 0 1998 IEEE.
- Yifan Gong, “Speech recognition in noisy environments: A survey”, Speech Communication 16 (199.5) 261-291,0167-6393/95/$09.50 0 1995 Elsevier Science B.V.
- Steve Renals, Nelson Morgan, Herve Bourlard and Michael Cohen, “Connectionist Probability Estimators in HMM Speech Recognition”, IEEE Transactions on Speech and Audio Processing, VOL. 2, NO. 1, PART 11, JANUARY 1994.
- Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, VOL. 22, NO. 10, OCTOBER 2014.
- Alin G. Chit¸u, Leon J.M. Rothkrantz, Pascal Wiggers and Jacek C. Wojdel, “Comparison between different feature extraction techniques for audio-visual speech recognition”, Journal on Multimodal User Interfaces, Vol. 1, No.1, March2007.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT
This work is licensed under a Creative Commons Attribution 4.0 International License.