
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
ICCE (2025)
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 3
May-June 2025
Indexing Partners



















Conversational AI Video Assistant
Author(s) | Mahammad Saadullah, Musrat Sultana, Dr. K. Rajitha, R. MohanKrishna Ayyappa |
---|---|
Country | India |
Abstract | This research paper introduces a Conversational AI Video Assistant developed to enhance user interaction with video content through the processing of inputs, transcription of audio, analysis of scenes, and delivery of context-aware responses in near real-time. The system is equipped with Whisper for accurate audio transcription, custom object detection models built using OpenCV and TensorFlow for visual analysis, and Coqui TTS for natural-sounding audio feedback, all integrated seamlessly via a user-friendly Gradio-based interface. Extensive evaluation across multiple test videos demonstrates efficient performance, with processing times scaling linearly with video length and an average real-time factor of 0.173, confirming suitability for real-time applications. The system also exhibits robust effectiveness, achieving an overall accuracy of 0.86, precision of 0.83, recall of 0.88, and F1-score of 0.85, which reflects its reliability in delivering relevant responses. Designed for practical applications, the assistant supports diverse domains such as education—enabling interactive learning from instructional videos—accessibility, by providing audio descriptions for visually impaired users, and smart home systems, through contextual assistance. By combining multimodal processing with an intuitive interface, this Conversational AI Video Assistant provides a transformative solution for engaging with video content interactively and meaningfully. |
Keywords | Conversational AI, Video Analysis, Scene Understanding, Multimodal Interaction, User Experience |
Field | Computer > Artificial Intelligence / Simulation / Virtual Reality |
Published In | Volume 7, Issue 3, May-June 2025 |
Published On | 2025-05-28 |
DOI | https://doi.org/10.36948/ijfmr.2025.v07i03.46053 |
Short DOI | https://doi.org/g9mn6f |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
