MODELING PREDOMINANT INSTRUMENTATION WITH DIFFUSION
Charis Cochran (Drexel University)*, Youngmoo Kim (Drexel University)
This paper will be presented in person
Most music consumption involves multi-instrument audio, and thus modeling complex timbres is vital for real-world music analysis and Music Information Retrieval (MIR) tasks. However, this task remains challenging in musical mixtures where definitions of timbre are less clear. Additionally, the advent of generative and multi-modal music models necessitates precise timbre representation for enhanced control in music generation and editing. Based on previous work highlighting the difficulties with learning instrument timbres in the context of Predominant Instrument Recognition (PIR), we explore the potential of diffusion networks to learn salient timbre representations for predominant instrument generation and timbre transfer. The results highlight the power of generative models to deepen our understanding of instrument timbre for complex scenarios with limited real-world data.