A Review on Deep Learning Based Lip-Reading

Kartik Datar; Meet N. Gandhi; Priyanshu Aggarwal; Mayank Sohani

doi:10.32628/CSEIT206140

Authors

Kartik Datar Department of Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India
Meet N. Gandhi Department of Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India
Priyanshu Aggarwal Department of Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India
Mayank Sohani Faculty, Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India

DOI:

https://doi.org//10.32628/CSEIT206140

Keywords:

Neural Network. Convolution Neural Network, Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) , Long short-term memory (LSTM) , Recurrent neural network (RNN).

Abstract

In the world of development and advancement, deep learning has made its significant impact in certain tasks in such a way which seemed impossible a few years ago. Deep learning has been able to solve problems which are even complex for machine learning algorithms. The task of lip reading and converting the lip moments to text is been performed by various methods, one of the most successful methods for the following is Lip-net they provide end to end conversion form lip to text. The end to end conversion of lip moments to the words is possible because of availability of huge data and development of different deep learning methods such as Convolution Neural Network and Recurrent Neural Networks. The use of Deep Learning in lip reading is a recent concept and solves upcoming challenges in real-world such as Virtual Reality system, assisted driving systems, sign language recognition, movement recognition, improving hearing aid via Google lens. Various other approaches along with different datasets are explained in the paper.

References

Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata1 “Lipreading using Convolutional Neural Network”. INTERSPEECH-2014,Singapore, 1149-1153.
J. Luettin, N. Thacker, and S. Beet, “Visual speech recognition using active shape models and hidden Markov models”. IEEE International Conference on Acoustics, Speech and Signal Processing DOI:10.1109/ICASSP.1996.543246
T. Cootes, G. Edwards, and C. Taylor, “Active appearance models” IEEE Transactions on Pattern Analysis and Machine Intelligence .DOI: 10.1109/34.927467
Yuanyao Lu*, Jie Yan and Ke Gu “Review on Automatic Lip-Reading Techniques” International Journal of Pattern Recognition and Artificial IntelligenceDOI:10.1142/S0218001418560074
Joon Son Chung ,Andrew Senior, Oriol Vinyals, Andrew Zisserman1 Department of Engineering Science, University of Oxford Google DeepMind DOI: arXiv:1611.05358v1.
Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell, “A neural image caption generator” Cornell University Cited as: arXiv:1411.4555 cs.CV]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget, “Continual prediction with LSTM”.Cornell University Cited as; arXiv:1509.01602v
T. Yoshida, K. Nakadai, and H. G. Okuno, “Automatic speech recognition improved by two-layered audio-visual integration for robot audition,” IEEE-RAS International Conference on Humanoid Robots DOI:10.1109/ICHR.2009.5379586
H. Kuwabara, K. Takeda, Y. Sagisaka, S. Katagiri, S. Morikawa, and T. Watanabe, “Construction of a Large-scale Japanese Speech Database and its Management System,” Proceedings. ed. / Anon. Vol. 1 Publ by IEEE, 1989. p. 560-563.
S. Young, G. Evermann, M. Gales, T. Hain, X. A. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, 2009.
A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM DOI: 10.1145/3065386
J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,”Elsevier DOI 10.1016/j.cviu.2018.02.001.

A Review on Deep Learning Based Lip-Reading

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite