SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints

Haonan Chen (Bytedance Inc.)*, Jordan B. L. Smith (TikTok), Janne Spijkervet (University of Amsterdam), Ju-Chiang Wang (ByteDance), Pei Zou (Bytedance Inc.), Bochen Li (University of Rochester), Qiuqiang Kong (Byte Dance), Xingjian Du (University of Rochester)

Keywords: Generative Tasks -> artistically-inspired generative tasks, Creativity -> creative practice involving MIR or generative technology ; Generative Tasks -> evaluation metrics; Generative Tasks -> interactions; Musical features and properties -> representations of music

Abstract:

Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences. To the best of our knowledge, this work is the first to demonstrate the feasibility of training symbolic generation models solely from extensive transcribed audio data. Furthermore, to enhance the controllability for the trained model, we introduce SymPAC (Symbolic Music Language Model with Prompting And Constrained Generation), which is a combination of (a) prompt bars in encoding and (b) a technique called Constrained Generation via Finite State Machines (FSMs) during inference time. We show the flexibility and controllability that this approach affords, which may be critical in making music AI useful to creators and users.

Reviews

No reviews available