Abstract:

Recent text-to-music models have enabled users to generate realistic audio music with a simple command. However, editing music audios remains challenging due to conflicting desiderata: performing fine-grained alterations on the audio while maintaining a simplistic user interface. To address this challenge, we propose Audio Prompt Adapter (or AP Adapter), a lightweight addition to pretrained text-to-music models. We utilize AudioMAE to extract features from the input audio, and construct attention-based adapters to feed these features into the internal layers of AudioLDM2, a diffusion text-to-music model. With only 22M trainable parameters, AP Adapter empowers users to harness both global (e.g., style and timbre) and local (e.g., melody) aspects of music, using the original audio and a short text as inputs. Through objective and subjective studies, we evaluate AP Adapter on three tasks: timbre transfer, style transfer, and accompaniment generation. Additionally, we demonstrate its effectiveness on out-of-domain audios containing unseen instruments during training.

Reviews

No reviews available

Back to Top

© 2024 International Society for Music Information Retrieval