Transcription-based lyrics embeddings: simple extraction of effective lyrics embeddings from audio

Jaehun Kim (Pandora / SiriusXM)*, Florian Henkel (SiriusXM + Pandora), Camilo Landau (Pandora / SiriusXM), Samuel E. Sandberg (SiriusXM + Pandora), Andreas F. Ehmann (SiriusXM + Pandora)

Keywords: MIR fundamentals and methodology -> lyrics and other textual data, Applications -> music recommendation and playlist generation; MIR fundamentals and methodology -> multimodality; MIR tasks -> automatic classification; Musical features and properties -> representations of music

Abstract:

The majority of Western popular music contains lyrics. Previous studies have shown that lyrics are a rich source of information and are complementary to other information sources, such as audio. One factor that hinders the research and application of lyrics on a large scale is their availability. To mitigate this, we propose the use of transcription-based lyrics embeddings (TLE). These estimate ground-truth' lyrics embeddings given only audio as input. Central to this approach is the use of transcripts derived from an automatic lyrics transcription (ALT) system instead of human-transcribed,ground-truth' lyrics, making them substantially more accessible. We conduct an experiment to assess the effectiveness of TLEs across various music information retrieval (MIR) tasks. Our results indicate that TLEs can improve the performance of audio embeddings alone, especially when combined, closing the gap with cases where ground-truth lyrics information is available.

Reviews

No reviews available