
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
AIMAR-2025
Conferences Published ↓
ICCE (2025)
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 4
July-August 2025
Indexing Partners



















GenMedia: A Research Project on Personalized Multi-Modal Media Generation Using Stable Diffusion and AudioCraft
Author(s) | Mr. Chinmay Mukund Kamble, Mr. Nakul Avinash Kamatkar |
---|---|
Country | India |
Abstract | In this paper, we present GenMedia, a research project focused on advancing the capabilities of Stable Diffusion models and AudioCraft's MusicGen for personalized multi-modal media generation. The project explores the fine-tuning of Stable Diffusion 2.0 and 3.5 using DreamBooth, with a particular emphasis on advancements in latent space optimization, prompt adherence, and image quality. These improvements enable the generation of high-quality, context-aware images based on user-provided text prompts and personalized datasets. Additionally, we investigate the fine-tuning of AudioCraft's MusicGen using Dora to synthesize personalized audio content, leveraging custom instrumental datasets and text prompts. The integration of FFMPEG enables the seamless combination of generated images and audio into cohesive video outputs. Through extensive experiments, we evaluate the performance of these models, focusing on their ability to create highly personalized and contextually relevant media content. This research highlights the potential of advanced generative models to revolutionize multi-modal AI, paving the way for future innovations in personalized media generation. |
Keywords | Stable Diffusion, AudioCraft, Multi-Modal Media Generation, Generative AI |
Field | Computer > Artificial Intelligence / Simulation / Virtual Reality |
Published In | Volume 7, Issue 4, July-August 2025 |
Published On | 2025-07-20 |
DOI | https://doi.org/10.36948/ijfmr.2025.v07i04.51404 |
Short DOI | https://doi.org/g9tz8p |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
