Impact of Time and Note Duration Tokenizations on Deep Learning Symbolic Music Modeling

Nathan Fradet (LIP6 - Sorbonne University)*; Nicolas Gutowski (University of Angers); Fabien Chhel (Groupe ESEO); Jean-Pierre Briot (CNRS)

P1-09: Impact of Time and Note Duration Tokenizations on Deep Learning Symbolic Music Modeling

Nathan Fradet (LIP6 - Sorbonne University)*, Nicolas Gutowski (University of Angers), Fabien Chhel (Groupe ESEO), Jean-Pierre Briot (CNRS)

Subjects (starting with primary): MIR tasks -> music generation ; MIR fundamentals and methodology -> symbolic music processing ; Applications -> music composition, performance, and production ; MIR tasks -> automatic classification ; Musical features and properties -> representations of music ; Applications -> music retrieval systems

Presented In Person: 4-minute short-format presentation

Abstract:

Symbolic music is widely used in various deep learning tasks, including generation, transcription, synthesis, and Music Information Retrieval (MIR). It is mostly employed with discrete models like Transformers, which require music to be tokenized, i.e., formatted into sequences of distinct elements called tokens. Tokenization can be performed in different ways, and recent research has focused on developing more efficient methods. However, the key differences between these methods are often unclear, and few studies have compared them. In this work, we analyze the current common tokenization methods and experiment with time and note duration representations. We compare the performance of these two impactful criteria on several tasks, including composer classification, emotion classification, music generation, and sequence representation. We demonstrate that explicit information leads to better results depending on the task.

If the video does not load properly please use the direct link to video