A Repetition-Based Triplet Mining Approach for Music Segmentation

Morgan Buisson (Telecom-Paris)*; Brian McFee (New York University); Slim Essid (Telecom Paris - Institut Polytechnique de Paris); Helene-Camille Crayencour (CNRS)

P4-02: A Repetition-Based Triplet Mining Approach for Music Segmentation

Morgan Buisson (Telecom-Paris)*, Brian McFee (New York University), Slim Essid (Telecom Paris - Institut Polytechnique de Paris), Helene-Camille Crayencour (CNRS)

Subjects (starting with primary): MIR tasks -> pattern matching and detection ; Knowledge-driven approaches to MIR -> representations of music ; Musical features and properties -> structure, segmentation, and form ; Musical features and properties -> representations of music ; Knowledge-driven approaches to MIR -> machine learning/artificial intelligence for music

Presented In Person: 4-minute short-format presentation

Abstract:

Contrastive learning has recently appeared as a well-suited method to find representations of music audio signals that are suitable for structural segmentation. However, most existing unsupervised training strategies omit the notion of repetition and therefore fail at encompassing this essential aspect of music structure. This work introduces a triplet mining method which explicitly considers repeating sequences occurring inside a music track by leveraging common audio descriptors. We study its impact on the learned representations through downstream music segmentation. Because musical repetitions can be of different natures, we give further insight on the role of the audio descriptors employed at the triplet mining stage as well as the trade-off existing between the quality of the triplets mined and the quantity of unlabelled data used for training. We observe that our method requires less non-annotated data while remaining competitive against other unsupervised methods trained on a larger corpus.

If the video does not load properly please use the direct link to video