GenMedia: A Research Project on Personalized Multi-Modal Media Generation Using Stable Diffusion and AudioCraft

Chinmay Mukund Kamble; Nakul Avinash Kamatkar

doi:10.36948/ijfmr.2025.v07i04.51404

GenMedia: A Research Project on Personalized Multi-Modal Media Generation Using Stable Diffusion and AudioCraft

Author(s)	Mr. Chinmay Mukund Kamble, Mr. Nakul Avinash Kamatkar
Country	India
Abstract	In this paper, we present GenMedia, a research project focused on advancing the capabilities of Stable Diffusion models and AudioCraft's MusicGen for personalized multi-modal media generation. The project explores the fine-tuning of Stable Diffusion 2.0 and 3.5 using DreamBooth, with a particular emphasis on advancements in latent space optimization, prompt adherence, and image quality. These improvements enable the generation of high-quality, context-aware images based on user-provided text prompts and personalized datasets. Additionally, we investigate the fine-tuning of AudioCraft's MusicGen using Dora to synthesize personalized audio content, leveraging custom instrumental datasets and text prompts. The integration of FFMPEG enables the seamless combination of generated images and audio into cohesive video outputs. Through extensive experiments, we evaluate the performance of these models, focusing on their ability to create highly personalized and contextually relevant media content. This research highlights the potential of advanced generative models to revolutionize multi-modal AI, paving the way for future innovations in personalized media generation.
Keywords	Stable Diffusion, AudioCraft, Multi-Modal Media Generation, Generative AI
Field	Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In	Volume 7, Issue 4, July-August 2025
Published On	2025-07-20
DOI	https://doi.org/10.36948/ijfmr.2025.v07i04.51404

View / Download PDF File

About IJFMR Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9687-828-838	editor@ijfmr.com

International Journal For Multidisciplinary Research

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

GenMedia: A Research Project on Personalized Multi-Modal Media Generation Using Stable Diffusion and AudioCraft

Share this