Conversational AI for Blind Users: Image Recognition and Assistance Using LSTM

Ms. P. Malathi; K. Sushmeena

doi:10.32628/CSEIT25113359

Authors

Ms. P. Malathi Assistant Professor, Department of AIDS, Chendhuran College of Engineering & Technology, Pudukkottai, Tamil Nadu, India Author
K. Sushmeena Student, Department of CSE, Chendhuran College of Engineering & Technology, Pudukkottai, Tamil Nadu, India Author

DOI:

https://doi.org/10.32628/CSEIT25113359

Abstract

Creating evocative text summaries for photographs is the challenge of image captioning, a sophisticated computer vision task. This technology understands the content of an image and conveys it in a way that is legible by humans by combining concepts from computer vision and natural language processing. Blind people's need for image captioning stems from the basic need for equality of access to information and inclusivity. In order to enable people with visual impairments to understand visual content that would otherwise be unavailable to them, this technology is essential in giving descriptive textual information about the contents of images. Image captioning promotes a sense of autonomy and lessens reliance on sighted help by enabling blind people to autonomously explore and comprehend the visual elements of their surroundings. The requirements of the visually handicapped are being met by a number of current programs and systems that use image captioning technology. In order to improve accessibility for those with visual impairments, this research proposes a novel method that uses Convolutional Neural Network (CNN) techniques to construct an image captioning system. In order to help the blind understand visual content they frequently come across in their daily lives, the system attempts to provide meaningful and thorough descriptions of visuals. By utilizing CNNs, the model is able to extract and interpret pertinent characteristics from images, producing meaningful captions that are subsequently conveyed to users via assistive technologies like speech synthesis. By providing a possible solution to close the visual information gap and enable people with visual impairments to interact and navigate the visual world more effectively, the project tackles the critical need for inclusive technology.

📊 Article Downloads

References

Dessì, Roberto, et al. "Cross-domain image captioning with discriminative finetuning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Solomon, Rodas, and Mesfin Abebe. "Amharic Language Image Captions Generation Using Hybridized Attention-Based Deep Neural Networks." Applied Computational Intelligence and Soft Computing 2023 (2023).

Ghandi, Taraneh, Hamidreza Pourreza, and Hamidreza Mahyar. "Deep learning approaches on image captioning: A review." ACM Computing Surveys 56.3 (2023): 1-39.

Afzal, Muhammad Kashif, et al. "Generative image captioning in Urdu using deep learning." Journal of Ambient Intelligence and Humanized Computing 14.6 (2023): 77197731.

Luo, Jianjie, et al. "Semantic-conditional diffusion networks for image captioning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Mahalakshmi, P., and N. Sabiyath Fatima. "Summarization of text and image captioning in information retrieval using deep learning techniques." IEEE Access 10 (2022): 18289-18297.

Sharma, Himanshu, et al. "Image captioning: a comprehensive survey." 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC). IEEE, 2020

Muhammad Shah, Faisal, et al. "Bornon: Bengali image captioning with transformer-based deep learning approach." SN Computer Science 3 (2022): 1-16.

Bhalekar, Madhuri, and Mangesh Bedekar. "D-CNN: a new model for generating image captions with text extraction using deep learning for visually challenged individuals." Engineering, Technology & Applied Science Research 12.2 (2022): 8366-8373.

Sharma, Himanshu, and Anand Singh Jalal. "Incorporating external knowledge for image captioning using CNN and LSTM." Modern Physics Letters B 34.28 (2020): 2050315.

Conversational AI for Blind Users: Image Recognition and Assistance Using LSTM

Authors

DOI:

Abstract

📊 Article Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

IssueDate

RightSideBlock

Latest publications