A New Dataset for Tag- and Text-based Controllable Symbolic Music Generation

Weihan Xu (Duke University)*, Julian McAuley (UCSD), Taylor Berg-Kirkpatrick (UCSD), Shlomo Dubnov (UC San Diego), Hao-Wen Dong (University of Michigan)

This paper will be presented in person

Abstract:

Recent years have seen many audio-domain text-to-music generation models that rely on large amounts of text-audio pairs for training. However, symbolic-domain controllable music generation has lagged behind due to the lack of a large-scale symbolic music dataset with extensive metadata and captions. In this paper, we present MetaScore, a new dataset consisting of 963K musical scores paired with rich metadata collected from an online music forum and generated psudo captions. With MetaScore, we explore tag- and text-based controllable symbolic music generation. Both subjective test and objective test showcase the potential of our dataset in tag- and text-conditioned music generation.