LV-46: BEAT AND DOWNBEAT TRACKING WITH GENERATIVE EMBEDDINGS

Tian, Haokun*, Liu, Kun, Fuentes, Magdalena

Abstract: It is standard practice to use spectrograms as input features for discriminating MIR tasks. However, recent research showed using representations produced by Jukebox (a music language model) led to better model performance. This was tested on music tagging, genre classification, key detection, emotion recognition, and music transcription. In this paper, we test it on beat and downbeat tracking. Specifically, we compare compressed Jukebox embeddings with spectrograms as input to a model that jointly predicts beat, downbeat, and tempo. Experiments show that the two inputs bring comparable results for beat tracking, while using Jukebox embeddings leads to significant improvements for downbeat tracking.