Which audio features can predict the dynamic musical emotions of both composers and listeners?
Eun Ji Oh (KAIST), Hyunjae Kim (KAIST), Kyung Myun Lee (KAIST)*
Keywords: Musical features and properties -> musical affect, emotion and mood, Human-centered MIR -> user behavior analysis and mining, user modeling; Human-centered MIR -> user-centered evaluation; Musical features and properties -> expression and performative aspects of music
Are composers’ emotional intentions conveyed to listeners through audio features? In the field of Music Emotion Recognition (MER), recent efforts have been made to predict listeners' time-varying perceived emotions using machine-learning models. However, interpreting these models has been challenging due to their black-box nature. To increase the explainability of models for subjective emotional experiences, we focus on composers’ emotional intentions. Our study aims to determine which audio features effectively predict both composers' time-varying emotions and listeners' perceived emotions. Seven composers performed 18 piano improvisations expressing three types of emotions (joy/happiness, sadness, and anger), which were then listened to by 36 participants in a laboratory setting. Both composers and listeners continuously assessed the emotional valence of the music clips on a 9-point scale (1: 'very negative' to 9: 'very positive'). Linear mixed-effect models analysis revealed that listeners significantly perceived the composers' intended emotions. Regarding audio features, the RMS was found to modulate the degree to which the listener's perceived emotion resembled the composer's emotion across all emotions. Moreover, the significant audio features that influenced this relationship varied depending on the emotion type. We propose that audio features related to the emotional responses of composers-listeners can be considered key factors in predicting listeners' emotional responses.
Reviews
No reviews available