Inversynth II: Sound Matching via Self-Supervised Synthesizer-Proxy and Inference-Time Finetuning

Oren Barkan (Microsoft); Shlomi Shvartzamn (Tel Aviv University ); Noy Uzrad  (Tel Aviv University ); Moshe Laufer  (Tel Aviv University); Almog Elharar (Tel Aviv University); Noam Koenigstein (Tel Aviv University)*

P5-14: Inversynth II: Sound Matching via Self-Supervised Synthesizer-Proxy and Inference-Time Finetuning

Oren Barkan (Microsoft), Shlomi Shvartzamn (Tel Aviv University ), Noy Uzrad (Tel Aviv University ), Moshe Laufer (Tel Aviv University), Almog Elharar (Tel Aviv University), Noam Koenigstein (Tel Aviv University)*

Subjects (starting with primary): MIR tasks -> music synthesis and transformation ; MIR tasks -> music generation

Presented Virtually: 4-minute short-format presentation

Abstract:

Synthesizers are widely used electronic musical instruments. Given an input sound, inferring the underlying synthesizer's parameters to reproduce it is a difficult task known as sound-matching. In this work, we tackle the problem of automatic sound matching, which is otherwise performed manually by professional audio experts. The novelty of our work stems from the introduction of a novel differentiable synthesizer-proxy that enables gradient-based optimization by comparing the input and reproduced audio signals. Additionally, we introduce a novel self-supervised finetuning mechanism that further refines the prediction at inference time. Both contributions lead to state-of-the-art results, outperforming previous methods across various metrics. Our code is available at: https://github.com/inversynth/InverSynth2.

Poster session Zoom meeting

If the video does not load properly please use the direct link to video