Conversational AI for Blind Users: Image Recognition and Assistance Using LSTM
DOI:
https://doi.org/10.32628/CSEIT25113359Abstract
Creating evocative text summaries for photographs is the challenge of image captioning, a sophisticated computer vision task. This technology understands the content of an image and conveys it in a way that is legible by humans by combining concepts from computer vision and natural language processing. Blind people's need for image captioning stems from the basic need for equality of access to information and inclusivity. In order to enable people with visual impairments to understand visual content that would otherwise be unavailable to them, this technology is essential in giving descriptive textual information about the contents of images. Image captioning promotes a sense of autonomy and lessens reliance on sighted help by enabling blind people to autonomously explore and comprehend the visual elements of their surroundings. The requirements of the visually handicapped are being met by a number of current programs and systems that use image captioning technology. In order to improve accessibility for those with visual impairments, this research proposes a novel method that uses Convolutional Neural Network (CNN) techniques to construct an image captioning system. In order to help the blind understand visual content they frequently come across in their daily lives, the system attempts to provide meaningful and thorough descriptions of visuals. By utilizing CNNs, the model is able to extract and interpret pertinent characteristics from images, producing meaningful captions that are subsequently conveyed to users via assistive technologies like speech synthesis. By providing a possible solution to close the visual information gap and enable people with visual impairments to interact and navigate the visual world more effectively, the project tackles the critical need for inclusive technology.
Downloads
References
Dessì, Roberto, et al. "Cross-domain image captioning with discriminative finetuning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
Solomon, Rodas, and Mesfin Abebe. "Amharic Language Image Captions Generation Using Hybridized Attention-Based Deep Neural Networks." Applied Computational Intelligence and Soft Computing 2023 (2023).
Ghandi, Taraneh, Hamidreza Pourreza, and Hamidreza Mahyar. "Deep learning approaches on image captioning: A review." ACM Computing Surveys 56.3 (2023): 1-39.
Afzal, Muhammad Kashif, et al. "Generative image captioning in Urdu using deep learning." Journal of Ambient Intelligence and Humanized Computing 14.6 (2023): 77197731.
Luo, Jianjie, et al. "Semantic-conditional diffusion networks for image captioning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
Mahalakshmi, P., and N. Sabiyath Fatima. "Summarization of text and image captioning in information retrieval using deep learning techniques." IEEE Access 10 (2022): 18289-18297.
Sharma, Himanshu, et al. "Image captioning: a comprehensive survey." 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC). IEEE, 2020
Muhammad Shah, Faisal, et al. "Bornon: Bengali image captioning with transformer-based deep learning approach." SN Computer Science 3 (2022): 1-16.
Bhalekar, Madhuri, and Mangesh Bedekar. "D-CNN: a new model for generating image captions with text extraction using deep learning for visually challenged individuals." Engineering, Technology & Applied Science Research 12.2 (2022): 8366-8373.
Sharma, Himanshu, and Anand Singh Jalal. "Incorporating external knowledge for image captioning using CNN and LSTM." Modern Physics Letters B 34.28 (2020): 2050315.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.