Abstract:

We introduce Cadenza, a new multi-stage generative framework for predicting expressive variations of symbolic musical ideas as well as unconditional generations. To accomplish this we introduce a novel MIDI encoding method, PerTok (Performance Tokenizer) that captures minute expressive details whilst maintaining short sequence length and vocabulary sizes for polyphonic, monophonic and rhythmic tasks. The proposed framework comprises two sequential stages: 1) Composer and 2) Performer. The Composer model is a transformer-based Variational Autoencoder (VAE), with Rotary Positional Embeddings (RoPE) and an autoregressive decoder modified to more effectively integrate the latent codes of the input musical idea. The Performer model is a bidirectional transformer encoder that is separately trained to predict velocities and microtimings on MIDI sequences. Extensive human evaluations demonstrate Cadenza's versatile capabilities in both meeting and surpassing the musical quality of other state-of-the-art symbolic models in unconditional generation, and secondly, composing new, expressive ideas that are both stylistically related to the input whilst providing novel ideas to the user. Our framework is designed, researched and implemented with the objective of ethically providing inspiration for musicians.

Reviews

No reviews available

Back to Top

© 2024 International Society for Music Information Retrieval