AI-Powered Multi-Modal Form Filling: Advancing Accessibility through Voice and Image Recognition
DOI:
https://doi.org/10.32628/CSEIT25111203Keywords:
AI-Driven Multi-Modal Form Filling, AI-Powered Accessibility, Multi-Modal Input Recognition, Voice-to-Text Processing, Image-to-Text-Processing, Optical Character Recognition (OCR), Digital Accessibility Solutions, Intelligent Data ExtractionAbstract
The rapid evolution of artificial intelligence in image and voice recognition has significantly enhanced the accessibility of digital interactions. This article discusses a multi-modal form filling approach that integrates real-time voice transcription and image-based data extraction. This integration not only mitigates the cognitive and physical challenges associated with traditional form-filling but also enriches user engagement and inclusivity. We demonstrate how the application of this technology streamlines data entry processes and notably improves accessibility for diverse user groups, establishing a new benchmark in user-friendly digital interactions. Results from various implementations show enhanced processing efficiency, a reduction in error rates, and high user satisfaction across different sectors, reinforcing the transformative potential of AI in making digital forms more accessible and efficient while adhering to high accuracy and satisfaction standards.
Downloads
References
Level Access, "The Sixth Annual State of Digital Accessibility Report: 2024-2025," Level Access Research Division, 2024. [Online]. Available: https://www.levelaccess.com/resources/the-sixth-annual-state-of-digital-accessibility-report-2024-2025/
P. J. Thompson and L. A. Rodriguez, "Digital accessibility: Challenges and opportunities," IIMB Management Review, Volume 31, Issue 1, March 2019. Available: https://www.sciencedirect.com/science/article/pii/S0970389617301131 DOI: https://doi.org/10.1016/j.iimb.2018.05.009
Fernando H. F. Botelho, "Accessibility to digital technology: Virtual barriers, real opportunities," National Library of Medicine, Dec. 2021. [Online]. Available: https://www.tandfonline.com/doi/full/10.1080/10400435.2021.1945705 DOI: https://doi.org/10.1080/10400435.2021.1945705
Mamyrbayev Orken et al., "Multimodal systems for speech recognition," International Journal of Mobile Communications 18(3):314, Jan. 2020. [Online]. Available: https://www.researchgate.net/publication/341174563_Multimodal_systems_for_speech_recognition DOI: https://doi.org/10.1504/IJMC.2020.107097
Eric Miller, "Measuring Accessibility Methods and Issues," International Transport Forum, 2024. [Online]. Available: https://www.itf-oecd.org/sites/default/files/docs/measuring-accessibility-methods-issues_1.pdf
DIGITALEUROPE, "Multimodal solutions to foster accessibility in digital products and services,” 2024. [Online]. Available: https://cdn.digitaleurope.org/uploads/2024/04/The-DIGITALEUROPE-study-for-the-European-Accessibility-Resource-Centre-1.pdf
Vivoka, "Voice AI in noisy and acoustically challenging environments," Vivoka Research Report, 2024. [Online]. Available: https://vivoka.com/voice-ai-in-noisy-and-acoustically-challenging-environments/
C.Y. Suen et al., "Building a new generation of handwriting recognition systems," Pattern Recognition Letters, Volume 14, Issue 4, April 1993. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/016786559390096V DOI: https://doi.org/10.1016/0167-8655(93)90096-V
Supriyono et al., "Advancements in natural language processing: Implications, challenges, and future directions," IEEE Transactions on Telematics and Informatics Reports, Volume 16, December 2024. Available: https://www.sciencedirect.com/science/article/pii/S2772503024000598 DOI: https://doi.org/10.1016/j.teler.2024.100173
Marvin M. Agüero-Torales et al., "Deep learning and multilingual sentiment analysis on social media data: An overview," Applied Soft Computing, Volume 107, August 2021. Available: https://www.sciencedirect.com/science/article/abs/pii/S1568494621002969 DOI: https://doi.org/10.1016/j.asoc.2021.107373
Abdelrahman Abdallah et al., "Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis," arXiv:2403.04080v1 [cs.CL], March 2024. Available: https://arxiv.org/html/2403.04080v1
Daniel Gaspar-Figueiredo et al., "Reinforcement Learning-Based Framework for the Intelligent Adaptation of User Interfaces," arXiv:2405.09255v1 [cs.HC], May 2024. Available: https://arxiv.org/html/2405.09255v1
Brian Kelly, "The Impact of Edge Computing on Real-Time Data Processing," International Journal of Computing and Engineering 5(5):44-58, July 2024. Available: https://www.researchgate.net/publication/382156395_The_Impact_of_Edge_Computing_on_Real-Time_Data_Processing DOI: https://doi.org/10.47941/ijce.2042
Abinaya B. and Santhi S., "A survey on genomic data by privacy-preserving techniques perspective, Computational Biology and Chemistry, 2021. Available: https://www.sciencedirect.com/topics/computer-science/privacy-preserving-technique
Peter Heumader et al., "Adaptive User Interfaces for People with Cognitive Disabilities within the Easy Reading Framework," National Library of Medicine, Aug. 2020. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC7479802/ DOI: https://doi.org/10.1007/978-3-030-58805-2_7
N. Chumuang and M. Ketcham, "Model for Handwritten Recognition Based on Artificial Intelligence," 2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Pattaya, Thailand, 2018, pp. 1-5, doi: 10.1109/iSAI-NLP.2018.8692958. DOI: https://doi.org/10.1109/iSAI-NLP.2018.8692958
E. H. Lim, T. Yuen Chai, M. a. -p. Muniandy, T. Fui Yong, B. Y. Ooi and J. -M. Lin, "Edge Computing and AI for IoT: Opportunities and Challenges," 2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), PingTung, Taiwan, 2023, pp. 357-358, doi: 10.1109/ICCE-Taiwan58799.2023.10226787. DOI: https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226787
"IEEE Standard Adoption of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Technical Specification Multimodal Conversion Version 1.2," in IEEE Std 3300-2022 , vol., no., pp.1-108, 28 April 2023, doi: 10.1109/IEEESTD.2023.10112603. DOI: https://doi.org/10.1109/IEEESTD.2023.10112603
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.