Boundary Regression for Leitmotif Detection in Music Audio

Sihun Lee (Sogang University)*, Dasaem Jeong (Sogang University)

This paper will be presented in person

Abstract:

Leitmotifs are musical phrases that reprise in various forms throughout a piece. Detecting the occurrence of leitmotifs from audio recordings is a highly challenging task, due to diverse variations and instrumentation. Leitmotif detection can be regarded as a subcategory of audio event detection, where the appearance of leitmotifs is predicted at the frame level. However, as leitmotifs carry distinct temporal structures and musical coherence, a more holistic approach akin to bounding box regression in visual object detection can be helpful. This would capture the entirety of the motif rather than fragmenting it into individual frames, thereby preserving its musical integrity and enhancing detection accuracy. We present our results on tackling leitmotif detection as a boundary regression task.