Emotion-based Piano Score Generation via Two-stage Transformer VAE

Jiahao Zhao (Kyoto University)*, Kazuyoshi Yoshii (Kyoto University)

This paper will be presented in person

Abstract:

With the advancement of generative models, controllable music generation (e.g., emotion-based) has gradually become feasible. However, achieving a balance between controllability and musicality/generative quality remains a challenging task in current research. Existing latent space-based methods, such as MusER, have struggled to capture the complex patterns underlying emotional expression and music theory. In this late-breaking demo, we present a preliminary study on emotion-based piano score generation. Our approach aims to enhance the controllability and accuracy of emotional content by disentangling musical elements. Additionally, we introduce a two-stage generation structure with pre-training to address the scarcity of emotion-labeled datasets, thereby improving the quality of the generated content and the robustness of the generation process.