Image Caption Generator Using Neural Networks

Authors

  • Sujeet Kumar Shukla  B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
  • Saurabh Dubey  B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
  • Aniket Kumar Pandey  B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
  • Vineet Mishra  B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
  • Mayank Awasthi  B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
  • Vinay Bhardwaj  Assistant Professor, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India

DOI:

https://doi.org//10.32628/CSEIT21736

Keywords:

Conventional Neural Network, Recurrent Neural Network, YOLO, Deep Learning

Abstract

In this paper, we focus on one of the visual recognition facets of computer vision, i.e. image captioning. This model’s goal is to come up with captions for an image. Using deep learning techniques, image captioning aims to generate captions for an image automatically. Initially, a Convolutional Neural Network is used to detect the objects in the image (InceptionV3). Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) with attention mechanism are used to generate a syntactically and semantically correct caption for the image based on the detected objects. In our project, we're working with a traffic sign dataset that has been captioned using the process described above. This model is extremely useful for visually impaired people who need to cross roads safely.

References

  1. R. Gerber and H. Nagel. Knowledge representation for the generation o f quantified natural language descriptions of vehicle traffic in image sequences. In ICIP. IEEE, 1996.
  2. S. Hochreiter and J. Schmidhuber. Long short - term memory. Neural Computation, 9(8), 1997.
  3. O. Vinyals, A. Toshev, S. Bengio and D. Erhan,” Show and tell: A neural image caption generator,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156- 3164, 2014.
  4. L. Yang, K.D .Tang, J. Yang and, L.J. L, “Dense captioning with joint inference and visual context”. In: CVPR, pp 1978–1987, 2017.
  5. J. Chen, W. Dong and M. Li, “Image Caption Generator using Deep Neural Networks”, March 2018.
  6. J. H. Tan, C. S .Chan and J. H. Chuah, “Image Captioning with Sparse Recurrent Neural Network, arXiv: 1908.10797,2019.
  7. International Journal of Recent Advances in Multi disciplinary Topics, VOL. 2, NO. 4, APRIL2021
  8. Andrej K, Li F-F Deep visual-semantic alignment for generating image descriptions. https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
  9. Wang W, Hu H (2019) Image captioning using region-based attention joint with time-varying attention. Neural Process Lett 1–13

Downloads

Published

2021-06-30

Issue

Section

Research Articles

How to Cite

[1]
Sujeet Kumar Shukla, Saurabh Dubey, Aniket Kumar Pandey, Vineet Mishra, Mayank Awasthi, Vinay Bhardwaj, " Image Caption Generator Using Neural Networks, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 7, Issue 3, pp.01-07, May-June-2021. Available at doi : https://doi.org/10.32628/CSEIT21736