LP-30: Improving Embeddings in Harmony Transformer

Ebrahimzadeh, Maral*, Krug, Valerie, Stober, Sebastian

Abstract: Learning the harmonic structure of music is crucial for various music information retrieval (MIR) tasks. Word2vec skip-gram, a well-established technique in natural language processing, has been found to effectively learn harmonic concepts in music. It represents music slices in a vector space, preserving meaningful geometric relationships. These embeddings hold great promise as inputs for MIR tasks, particularly automatic chord recognition (ACR). However, ACR research predominantly focuses on audio data due to the limited availability of well-annotated symbolic music datasets. In this work, we propose an innovative approach utilizing the Harmony Transformer (HT) architecture by Chen and Su. Instead of incorporating input embedding within the network, we leverage skip-gram as an unsupervised embedding technique. Our experiments show that this unsupervised method produces embeddings that adeptly capture harmonic concepts. We also introduce a novel visualization method to assess the fidelity of these embeddings in representing harmonic musical concepts. We perform our experiments on the Lakh MIDI and the BPS-FH dataset.