Enhancement of Speech and Language Models through unsupervised Learning with Music Datasets

Eviatar Bas (Independent)*, Iran R Roman (Queen Mary University of London)

This paper will be presented in person

Abstract:

Music processing is known to emerge spontaneously in early development, yet the impact of music exposure on brain development and cognitive abilities, such as language acquisition, remains unclear. In this study, we investigated this effect on artificial neural networks by training identical autoencoders to recreate speech excerpts using datasets with varying proportions of music. We then assessed the models' performance by using transfer learning to apply the encoder to a language classification task. Our findings indicate that incorporating a small share of music into the pre-training data improved the ability of the model trained on English data to classify other languages, including Japanese and Korean, suggesting a potential benefit of music exposure in enhancing model generalization to linguistically distant languages.