LP-10: On the Use of Synthesized Datasets and Transformer Adaptors for Musical Instrument Recognition

Tanaka, Keitaro*, Luo, Yin-Jyun, Cheuk, Kin Wai, Yoshii, Kazuyoshi, Morishima, Shigeo, Dixon, Simon

Abstract: This paper investigates training methods for musical instrument recognition (IR). Many studies have tackled IR and improved performance based on limited primary datasets (e.g., the OpenMIC dataset). As such, IR on other datasets, especially small ones, is yet to be studied. We propose to utilize a large synthesized dataset for IR on small datasets with real-world samples. Specifically, we first pre-train a transformer-based model with the Slakh2100 dataset to initialize its weights. We then fine-tune the model by training using the target datasets. We compare our approach with the adaptor approach, widely known as effective in fine-tuning large language models. We also investigate how the IR performance changes when we can access only a limited number of samples from the target dataset. Our experiment shows that the weight initialization performs consistently better than training from scratch and the adaptor approach.