Image Caption Generator Using Neural Networks
DOI:
https://doi.org/10.32628/CSEIT21736Keywords:
Conventional Neural Network, Recurrent Neural Network, YOLO, Deep LearningAbstract
In this paper, we focus on one of the visual recognition facets of computer vision, i.e. image captioning. This model’s goal is to come up with captions for an image. Using deep learning techniques, image captioning aims to generate captions for an image automatically. Initially, a Convolutional Neural Network is used to detect the objects in the image (InceptionV3). Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) with attention mechanism are used to generate a syntactically and semantically correct caption for the image based on the detected objects. In our project, we're working with a traffic sign dataset that has been captioned using the process described above. This model is extremely useful for visually impaired people who need to cross roads safely.
References
- R. Gerber and H. Nagel. Knowledge representation for the generation o f quantified natural language descriptions of vehicle traffic in image sequences. In ICIP. IEEE, 1996.
- S. Hochreiter and J. Schmidhuber. Long short - term memory. Neural Computation, 9(8), 1997.
- O. Vinyals, A. Toshev, S. Bengio and D. Erhan,” Show and tell: A neural image caption generator,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156- 3164, 2014.
- L. Yang, K.D .Tang, J. Yang and, L.J. L, “Dense captioning with joint inference and visual context”. In: CVPR, pp 1978–1987, 2017.
- J. Chen, W. Dong and M. Li, “Image Caption Generator using Deep Neural Networks”, March 2018.
- J. H. Tan, C. S .Chan and J. H. Chuah, “Image Captioning with Sparse Recurrent Neural Network, arXiv: 1908.10797,2019.
- International Journal of Recent Advances in Multi disciplinary Topics, VOL. 2, NO. 4, APRIL2021
- Andrej K, Li F-F Deep visual-semantic alignment for generating image descriptions. https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
- Wang W, Hu H (2019) Image captioning using region-based attention joint with time-varying attention. Neural Process Lett 1–13
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.