A Review on Deep Learning Based Lip-Reading

Authors

  • Kartik Datar  Department of Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India
  • Meet N. Gandhi  Department of Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India
  • Priyanshu Aggarwal  Department of Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India
  • Mayank Sohani  Faculty, Computer Science, MPSTME, NMIMS, Shirpur, Maharashtra, India

DOI:

https://doi.org//10.32628/CSEIT206140

Keywords:

Neural Network. Convolution Neural Network, Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) , Long short-term memory (LSTM) , Recurrent neural network (RNN).

Abstract

In the world of development and advancement, deep learning has made its significant impact in certain tasks in such a way which seemed impossible a few years ago. Deep learning has been able to solve problems which are even complex for machine learning algorithms. The task of lip reading and converting the lip moments to text is been performed by various methods, one of the most successful methods for the following is Lip-net they provide end to end conversion form lip to text. The end to end conversion of lip moments to the words is possible because of availability of huge data and development of different deep learning methods such as Convolution Neural Network and Recurrent Neural Networks. The use of Deep Learning in lip reading is a recent concept and solves upcoming challenges in real-world such as Virtual Reality system, assisted driving systems, sign language recognition, movement recognition, improving hearing aid via Google lens. Various other approaches along with different datasets are explained in the paper.

References

  1. Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata1 “Lipreading using Convolutional Neural Network”. INTERSPEECH-2014,Singapore, 1149-1153.
  2. J. Luettin, N. Thacker, and S. Beet, “Visual speech recognition using active shape models and hidden Markov models”. IEEE International Conference on Acoustics, Speech and Signal Processing DOI:10.1109/ICASSP.1996.543246
  3. T. Cootes, G. Edwards, and C. Taylor, “Active appearance models” IEEE Transactions on Pattern Analysis and Machine Intelligence .DOI: 10.1109/34.927467
  4. Yuanyao Lu*, Jie Yan and Ke Gu “Review on Automatic Lip-Reading Techniques” International Journal of Pattern Recognition and Artificial IntelligenceDOI:10.1142/S0218001418560074
  5. Joon Son Chung ,Andrew Senior, Oriol Vinyals, Andrew Zisserman1 Department of Engineering Science, University of Oxford Google DeepMind DOI: arXiv:1611.05358v1.
  6. Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell, “A neural image caption generator” Cornell University Cited as: arXiv:1411.4555 cs.CV]
  7. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget, “Continual prediction with LSTM”.Cornell University Cited as; arXiv:1509.01602v
  8. T. Yoshida, K. Nakadai, and H. G. Okuno, “Automatic speech recognition improved by two-layered audio-visual integration for robot audition,” IEEE-RAS International Conference on Humanoid Robots DOI:10.1109/ICHR.2009.5379586
  9. H. Kuwabara, K. Takeda, Y. Sagisaka, S. Katagiri, S. Morikawa, and T. Watanabe, “Construction of a Large-scale Japanese Speech Database and its Management System,” Proceedings. ed. / Anon. Vol. 1 Publ by IEEE, 1989. p. 560-563.
  10. S. Young, G. Evermann, M. Gales, T. Hain, X. A. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, 2009.
  11. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM DOI: 10.1145/3065386
  12. J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,”Elsevier DOI 10.1016/j.cviu.2018.02.001.

Downloads

Published

2020-02-29

Issue

Section

Research Articles

How to Cite

[1]
Kartik Datar, Meet N. Gandhi, Priyanshu Aggarwal, Mayank Sohani, " A Review on Deep Learning Based Lip-Reading, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 6, Issue 1, pp.182-188, January-February-2020. Available at doi : https://doi.org/10.32628/CSEIT206140