A New Dataset, Notation Software, and Representation for Computational Schenkerian Analysis
Stephen Hahn (Duke)*, Weihan Xu (duke), Zirui Yin (Duke University), Rico Zhu (Duke University), Simon Mak (Duke University), Yue Jiang (Duke University), Cynthia Rudin (Duke)
Keywords: Human-centered MIR -> music interfaces and services; MIR fundamentals and methodology -> symbolic music processing; Musical features and properties -> structure, segmentation, and form, Evaluation, datasets, and reproducibility -> novel datasets and use cases
Schenkerian Analysis (SchA) is a uniquely expressive method of music analysis, combining elements of melody, harmony, counterpoint, and form to describe the hierarchical structure supporting a work of music. However, despite its powerful analytical utility and potential to improve music understanding and generation, SchA has rarely been utilized by the computer music community. This is in large part due to the paucity of available high-quality data in a computer-readable format. With a larger corpus of Schenkerian data, it may be possible to infuse machine learning models with a deeper understanding of musical structure, leading to more "human" results. To encourage further research in Schenkerian analysis and its potential benefits for music informatics and generation, this paper presents three main contributions: 1) a new and growing dataset of SchAs, the largest in human- and computer-readable formats to date (>140 excerpts), 2) a novel software for visualization and collection of SchA data, and 3) a novel, flexible representation of SchA as a heterogeneous-edge graph data structure.
Reviews
All the reviewers appreciated that the paper is well-written and clear, with an appropriate bibliography, and is scientifically sound, making it a valuable contribution to the community.
Some remarks were done about the availability of the data (all reviewers) and some suggestions of improvement were made, to explain more in details the dataset, its metadata and its links to other datasets (R2, R3, MR), to provide more details on the notation software (R2) and to improve and clarify the figures (all reviewers).
Please take into account those comments for the final version of your paper.
I greatly congratulate the authors on their work! I am pleased to see initiatives that move from black-box models to generative tasks and focus on more interpretable models, especially applied to complex and often subjective tasks like Schenkerian analysis. The brief explanation of Schenkerian analysis in Section 1.1 helps the reader who is not an expert in the subject to understand minimally what the process entails. Furthermore, throughout the work, the authors take care to discuss musically their points, always using the same example.
I have a few minor suggestions for the authors:
-
In lines 219 to 221, the authors mention that the dataset consists of pieces from the common practice period, Gentle Giant, and others. I must admit, I am very curious to know what falls in between! For example, are there rock pieces with simpler melodies/harmonies, like Elvis Presley, Beatles? As a big fan of Gentle Giant, I'm aware that they make extensive use of ideas from common practice in their pieces (e.g., Knots, In Reflection), but with a significant harmonic and melodic modernization. To what extent can the usual Schenkerian analysis capture these nuances? I ask this because I am aware of a work of Schenkerian analysis applied to Brazilian Popular Music (https://www.scielo.br/j/pm/a/Q93rFfZWy49cm9xjb3mwCQt/?format=pdf in Portuguese -- I hope the authors can read it, as I do not know their nationality), and some adaptations are necessary to capture specific nuances of the aesthetic.
-
Will the software be made public? Open-source? Freeware? I understand that the software may still be unsuitable for widespread use in its current state, but it is important to mention the future of the software for readers who might want to use it.
-
I know that positioning figures in LaTeX is difficult, but Figs. 2 and 3 appear after Fig. 4. It would be good to fix that.
-
Using a single color in the bar graph of Fig. 3 would make it easier to read.
-
In Fig. 4, having the x-axis ticks every 1 would be helpful.
-
The proposal could also be used to assist in teaching Schenkerian analysis! Perhaps this is something important to mention.
I conclude my comments once again by congratulating the authors for tackling a complex problem!
The paper is well-written and clearly organized, with a thorough bibliography. The context is well set on Schenkerian analysis. The encoding formats and graphical representations (fig.2b) are convincing. It has three main contributions. Whereas the graph structure is well detailed in Section 3 (and supported by Figure 5, which is enlightening), I would expect more details for both the dataset and especially the notation software.
The dataset: - Its process of creation can be slightly more detailed. Regardless of the name of the analysts (who could even remain anonymous after the publication), it can be useful to know more about their respective experience (experts in music composition, musicology, Schenkerian analysis? ; level/years of practice?). Any annotation guidelines specific to the format or tool used, any potential source of disagreement (potentially reduced by a reviewing process between analysts for example)? - Indicate where the dataset would be available, even in a submission, with an fake/anonymous URL. As the dataset is meant to grow, it also becomes important to number the versions/releases of your dataset to make experiments reproducible (for possible "We trained our model on 'SchA' Dataset V1.1.7 (180 samples)") and keep track of the evolution. - In that sense, I recommend saving JSON format (which is already a sensible choice for machine- and human-readability) in unfolded versions to make them more human-readable, and easier to track changes in version control systems. (In supplementary material, provided samples are one-liners) - Stick with the current number of analyses: 145 -- even if future versions will include more of them. - The analyses of some statistics of the dataset is appreciated. Some comments in l.243-268 are important to understand the figures/table, and should be added in their captions, to make the figures more readable, more autonomous. - Provide more information about the metadata. If not in the paper, at least in a summary file in the dataset location.
The notation software: - In current state, the dataset and graph representation are the main contributions of the paper; and they are valuable. The software appears primarily as a tool that the authors (or analysts) used to produce and read the data. But it is not well documented, although it appears in the title. - Among other questions: - What technology or language is used? - Is it a script to run, an executable, a web application? This can be critical for accessibility, if the goal is to facilitate SchA annotation at scale. - How the user interact with the system? If it displays the graphical representation of an JSON annotation, it is already interesting, but it needs to be explicitly mentioned. Can users interact the graphical representation, with JSON or both? What is an example of "simple commands" (l.275)? - l.279: In this context, the formulation "We are currently working on..." leaves a feeling of unfinished work. Nevertheless, it is completely okay to indicate what the features of current version are, and then evoke separately other features to be included in a future release. (minor issue)
I believe the contributions of this paper are valuable, but it has to be made clear whether: - 1. the software can be used as it is by the community, and it is a major : in this case, it needs more details to be understood how. - or 2. it is a tool to facilitate building the dataset: in this case, it is a program which is part of the creation process of the dataset; it is not the main topic of this paper, and can be removed from the title. This framework is still precious and the code usable by the community.
Minor comments on the form: - If possible, improve the position of figures so that more references are on the same pages as their figure, especially Figures 2, 3 and 4. Try splitting 3a and 3b in two separate figures (even if they stay at the same position). As mentioned before, captions can be more explicit. - Some use of quotes sometimes put unnecessary distance. l.126 "fractal" is justified if it comes from the reference (same for "goodness metric" l.153), but "big data" l.23, "atonal" l.74, "human" l.13 (?), "ground truth" l.198 are better established concepts that can be in plain text, or in italic if it really needs to stand out. - 1.1 and 1.2 can form an autonomous Section 2 "History of computational SchA / State of the art / Related work", separate from the introduction, which sounds concluding after the outlines in l.101-105.
This paper introduces an extension to a previously not so explored dataset, incorporating Schenkerian analyses of classical music. The newly added data focuses mostly on monophonic short fugal themes from the Baroque era. In my view, any open-source dataset constitutes a valuable asset for the community. Moreover, the availability of expert musicological annotations for Schenkerian analysis is both rare and precious.
Here are some comments and suggestions for the authors:
-
Availability of Data: It is mentioned in the paper that the dataset contains 145 new excerpts, but there is no clear indication in the paper regarding whether this data will be made publicly available. If it is intended to be public, I recommend including an anonymized link in the paper or explicitly stating its public availability in the final version.
-
Clarity on Dataset Composition: From the examples provided and the supplementary material, it is implied that most of the data consists of short monophonic melodies. It would be beneficial for clarity if the authors explicitly state this information in the text if this is the case, otherwise I would highly suggest to include a more complete polyphonic example since space allows it. The excerpts depicted in Figure 4 may not fully convey this aspect, hence clarifying it within the text would be a plus for clarity.
-
Clarification on Clustering Example: The clustering example presented in Figure 5 prompts the question of whether only pairs of note contractions are allowed for each level transition. In Schenkerian analysis, multiple notes can be reduced from the Foreground to the Middleground in a single step. While this doesn't pose a significant issue, it may be prudent to mention this fact within the paper to enhance clarity.
-
Integration with Pre-existing Datasets: A key aspect of any new dataset paper is linking the new data with existing datasets. This is particularly crucial for smaller-scale musicological datasets. Given that the dataset in this paper includes fugal themes from the Well-Tempered Clavier (WTC), it could complement the ALGOMUS Bach fugues dataset effectively. I strongly recommend that the authors associate any related fugal themes from their dataset with existing datasets in the future (and add it as future work), as multi-level analyses are especially valuable in sparse musicological datasets.
We added information about the analysts, availability of the dataset and notation software, and technology used for the notation software. Additionally, several small clarity edits were made, such as added plot ticks, rearranging of figure location, etc.