Robust and Accurate Audio Synchronization Using Raw Features From Transcription Models

Johannes Zeitler (International Audio Laboratories Erlangen)*, Ben Maman (Tel Aviv University), Meinard Müller (International Audio Laboratories Erlangen)

Keywords: MIR tasks -> alignment, synchronization, and score following, MIR tasks -> music transcription and annotation

Abstract:

In Music Information Retrieval (MIR), precise synchronization of musical events is crucial for tasks like aligning symbolic information with music recordings or transferring annotations between audio versions. To achieve high temporal accuracy, synchronization approaches integrate onset-related information extracted from music recordings using either traditional signal processing techniques or exploiting symbolic representations obtained by data-driven automatic music transcription (AMT) approaches. In line with this research direction, our paper introduces a high-resolution synchronization approach that combines recent AMT techniques with traditional synchronization methods. Rather than relying on the final symbolic AMT results, we show how to exploit raw onset and frame predictions obtained as intermediate outcomes from a state-of-the-art AMT approach. Through extensive evaluations conducted on piano recordings under varied acoustic conditions across different transcription models, audio features, and dynamic time warping variants, we illustrate the advantages of our proposed method in both audio–audio and audio–score synchronization tasks. Specifically, we emphasize the effectiveness of our approach in aligning historical piano recordings with poor audio quality. We underscore how additional fine-tuning steps of the transcription model on the target dataset enhance alignment robustness, even in challenging acoustic environments.

Reviews

No reviews available