The ListenBrainz Listens Dataset
Kartik Ohri (MetaBrainz Foundation Inc.)*, Robert Kaye (MetaBrainz Foundation Inc.)
Keywords: Applications -> digital libraries and archives; Applications -> music recommendation and playlist generation; MIR fundamentals and methodology -> metadata, tags, linked data, and semantic web, Evaluation, datasets, and reproducibility -> novel datasets and use cases
The ListenBrainz listens dataset is a continually evolv- ing repository of music listening history events submitted by all ListenBrainz users. Currently totalling over 800 million entries, each datum within the dataset encapsu- lates a timestamp, a pseudonymous user identifier, track metadata, and optionally MusicBrainz identifiers facilitat- ing seamless linkage to external resources and datasets. This paper discusses the process of raw data acquisition, the subsequent steps of data synthesis and cleaning, the comprehensive contents of the refined dataset, and the di- verse potential applications of this invaluable resource. Al- though not the largest dataset in terms of music listening events (yet), its distinctiveness lies in its perpetual evolu- tion, with users contributing data daily. This paper under- scores the significance of the ListenBrainz listens dataset as a significant asset for researchers and practitioners alike, offering insights into music consumption patterns, user preferences, and avenues for further exploration in the fields of music information retrieval and recommendation systems.
Reviews
No reviews available