Abstract:

Much of the research in Music Information Retrieval (MIR) relies on data: collections of audio recordings, musical scores, meta-information about music, human opinions, and more. The International Society for MIR (ISMIR) Conference offers a glimpse into the data being used in MIR research and thus what is (and is not) studied by the field. Unlike past ISMIR conference proceedings with consistent metadata on each paper, there does not exist a comprehensive repository with detailed metadata on the datasets used in ISMIR papers. We propose a collection tool for community-sourcing the meta information about the data of ISMIR as well as a procedure for processing and cleaning that collected data. We view this as a necessary first step towards being able to quantify the influence of certain datasets across MIR as well as address questions of data diversity and generalizability of MIR work.