P5-09: Semi-Automated Music Catalog Curation Using Audio and Metadata
Brian Regan (Spotify)*, Desislava Hristova (Spotify), Mariano Beguerisse-Díaz (Spotify)
Subjects (starting with primary): Applications -> digital libraries and archives ; MIR fundamentals and methodology -> multimodality ; Knowledge-driven approaches to MIR -> machine learning/artificial intelligence for music ; Human-centered MIR -> user-centered evaluation ; MIR fundamentals and methodology -> metadata, tags, linked data, and semantic web
Presented In Person: 4-minute short-format presentation
We present a system to assist Subject Matter Experts (SMEs) to curate large online music catalogs. The system detects releases that are incorrectly attributed to an artist discography (misattribution), when the discography of a single artist is incorrectly separated (duplication), and predicts suitable relocations of misattributed releases. We use historical discography corrections to train and evaluate our system's component models. These models combine vector representations of audio with metadata-based features, which outperform models based on audio or metadata alone. We conduct three experiments with SMEs in which our system detects misattribution in artist discographies with precision greater than 77%, duplication with precision greater than 71%, and by combining the approaches, predicts a correct relocation for misattributed releases with precision up to 45%.
These results demonstrate the potential of such proactive curation systems in saving valuable human time and effort by directing attention where it is most needed.
If the video does not load properly please use the direct link to video