Using Item Response Theory to Aggregate Music Annotation Results of Multiple Annotators

Tomoyasu Nakano (National Institute of Advanced Industrial Science and Technology (AIST))*, Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST))

Keywords: , MIR tasks -> music transcription and annotation

Abstract:

Human music annotation is one of the most important tasks in music information retrieval (MIR) research. Results of labeling, tagging, assessment, and evaluation can be used as training data for machine learning models that estimate them automatically. For such machine learning purposes, a single target (e.g., song) is usually annotated by multiple human annotators, and the results are aggregated by majority voting or averaging. Majority voting, however, requires the number of annotators to be an odd number, which is not always possible. And averaging is sensitive to differences in the judgmental characteristics of each annotator and cannot be used for ordinal scales. This paper therefore proposes that the item response theory (IRT) be used to aggregate the music annotation results of multiple annotators. IRT-based models can jointly estimate annotators' characteristics and latent scores (i.e., aggregations of annotation results) of the targets, and they are also applicable to ordinal scales. We evaluated the IRT-based models in two actual cases of music annotation --- semantic tagging of music and Likert scale-based evaluation of singing skill --- and compared those models with their simplified models that do not consider the characteristics of each annotator.

Reviews

No reviews available