P1-03: Data Collection in Music Generation Training Sets: A Critical Analysis
Fabio Morreale (University of Auckland)*, Megha Sharma (University of Tokyo), I-Chieh Wei (University of Auckland)
Subjects (starting with primary): Philosophical and ethical discussions -> ethical issues related to designing and implementing MIR tools and technologies ; MIR tasks -> music generation ; Philosophical and ethical discussions -> legal and societal aspects of MIR
Presented In Person: 4-minute short-format presentation
The practices of data collection in training sets for Automatic Music Generation (AMG) tasks are opaque and overlooked. In this paper, we aimed to identify these practices and surface the values they embed. We systematically identified all datasets used to train AMG models presented at the last ten editions of ISMIR. For each dataset, we checked how it was populated and the extent to which musicians wittingly contributed to its creation.\ Almost half of the datasets (42.6%) were indiscriminately populated by accumulating music data available online without seeking any sort of permission. We discuss the ideologies that underlie this practice and propose a number of suggestions AMG dataset creators might follow. Overall, this paper contributes to the emerging self-critical corpus of work of the ISMIR community, reflecting on the ethical considerations and the social responsibility of our work.
If the video does not load properly please use the direct link to video