Toward a More Complete OMR Solution
Guang Yang (University of Washington)*, Muru Zhang (University of Washington), Lin Qiu (University of Washington), Yanming Wan (University of Washington), Noah A Smith (University of Washington and Allen Institute for AI)
Keywords: Evaluation, datasets, and reproducibility -> evaluation metrics; Knowledge-driven approaches to MIR -> machine learning/artificial intelligence for music, MIR tasks -> optical music recognition
Optical music recognition (OMR) aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the model first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly). In this paper, we focus on the MUSCIMA++ v2.0 dataset. It represents musical notation as a graph, where the pairwise relationships among detected music objects are predicted. Most previous work on notation assembly unrealistically assumes perfect object detection. In this study, we consider both stages together. First, we introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output. We find that this model is able to outperform existing models trained on perfect detection output, showing the benefit of considering the detection and assembly stage in a more holistic way. These findings are an important step toward a more complete OMR solution.
Reviews
No reviews available