Pitch ControlNet: Continuous Pitch Control for Monophonic Instrument Sound Generation

Dabin Kim (Korea Advanced Institute of Science and Technology)*, Junwon Lee (KAIST), Minseo Kim (KAIST), Juhan Nam (KAIST)

This paper will be presented in person


In monophonic instrument sound generation tasks, integrating continuous pitch control with text-to-audio (TTA) models is crucial for practical music production. To address this, we propose Pitch-ControlNet, a framework leveraging ControlNet to reflect time-varying pitch expressions in the generated audio of pretrained AudioLDM. Our approach enables sequence-level pitch manipulation by utilizing fundamental frequency (f0) contours, while retaining the benefits of high-quality text-prompted generation from the pretrained model. Experimental results show that our framework consistently achieves high pitch accuracy across a wide frequency range, preserving the target instrument’s timbre and high audio quality. The model’s potential for practical application in music production is showcased on our demo website.