Abstract:

The scarcity of symbolic music datasets has long been a challenge in the field of music information retrieval. Many studies have emphasized the need for high-quality, manually annotated datasets that include multifaceted labels, or focus on underrepresented periods like the Romantic period. In this paper, we present the S3, Symbolic Symphony Set, a comprehensive collection featuring four symphonies, totalling 16 movements, by Mozart, Beethoven, Dvorak, and Tchaikovsky. This dataset includes XML files and detailed annotations in the CSV format for notes and musical structure on both horizontal and vertical aspects, which are commonly known as form-related and texture-related information. The note annotations are semi-automatically generated. Form-related information includes form analysis, cadence, and harmony, while orchestral texture include the role (melody, rhythm, harmony, or mixed) for each instrument. All annotations have been converted into CSV format to facilitate further analysis and modeling. Additionally, manually annotated PDF files are included in the dataset for reference. Our dataset is available on https://github.com/iis-mctl/mctl-symphony-dataset.