Image Caption Generator Using Neural Networks

Sujeet Kumar Shukla; Saurabh Dubey; Aniket Kumar Pandey; Vineet Mishra; Mayank Awasthi; Vinay Bhardwaj

doi:10.32628/CSEIT21736

Authors

Sujeet Kumar Shukla B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
Saurabh Dubey B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
Aniket Kumar Pandey B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
Vineet Mishra B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
Mayank Awasthi B. Tech. Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
Vinay Bhardwaj Assistant Professor, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India

DOI:

https://doi.org/10.32628/CSEIT21736

Keywords:

Conventional Neural Network, Recurrent Neural Network, YOLO, Deep Learning

Abstract

In this paper, we focus on one of the visual recognition facets of computer vision, i.e. image captioning. This model’s goal is to come up with captions for an image. Using deep learning techniques, image captioning aims to generate captions for an image automatically. Initially, a Convolutional Neural Network is used to detect the objects in the image (InceptionV3). Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) with attention mechanism are used to generate a syntactically and semantically correct caption for the image based on the detected objects. In our project, we're working with a traffic sign dataset that has been captioned using the process described above. This model is extremely useful for visually impaired people who need to cross roads safely.

References

R. Gerber and H. Nagel. Knowledge representation for the generation o f quantified natural language descriptions of vehicle traffic in image sequences. In ICIP. IEEE, 1996.
S. Hochreiter and J. Schmidhuber. Long short - term memory. Neural Computation, 9(8), 1997.
O. Vinyals, A. Toshev, S. Bengio and D. Erhan,” Show and tell: A neural image caption generator,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156- 3164, 2014.
L. Yang, K.D .Tang, J. Yang and, L.J. L, “Dense captioning with joint inference and visual context”. In: CVPR, pp 1978–1987, 2017.
J. Chen, W. Dong and M. Li, “Image Caption Generator using Deep Neural Networks”, March 2018.
J. H. Tan, C. S .Chan and J. H. Chuah, “Image Captioning with Sparse Recurrent Neural Network, arXiv: 1908.10797,2019.
International Journal of Recent Advances in Multi disciplinary Topics, VOL. 2, NO. 4, APRIL2021
Andrej K, Li F-F Deep visual-semantic alignment for generating image descriptions. https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
Wang W, Hu H (2019) Image captioning using region-based attention joint with time-varying attention. Neural Process Lett 1–13

Image Caption Generator Using Neural Networks

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite