Abstract:

Optical music recognition (OMR) aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the model first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly). In this paper, we focus on the MUSCIMA++ v2.0 dataset. It represents musical notation as a graph, where the pairwise relationships among detected music objects are predicted. Most previous work on notation assembly unrealistically assumes perfect object detection. In this study, we consider both stages together. First, we introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output. We find that this model is able to outperform existing models trained on perfect detection output, showing the benefit of considering the detection and assembly stage in a more holistic way. These findings are an important step toward a more complete OMR solution.

Reviews

No reviews available

Back to Top

© 2024 International Society for Music Information Retrieval