P2-11: The Coordinated Corpus of Popular Musics (CoCoPops): A Meta-Dataset of Melodic and Harmonic Transcriptions
Claire Arthur (Georgia Institute of Technology)*, Nathaniel Condit-Schultz (Georgia Institute of Technology)
Subjects (starting with primary): Knowledge-driven approaches to MIR -> cognitive MIR ; Computational musicology -> digital musicology ; Musical features and properties -> melody and motives ; Evaluation, datasets, and reproducibility -> novel datasets and use cases ; Computational musicology -> systematic musicology ; Musical features and properties -> harmony, chords and tonality
Presented In Person: 4-minute short-format presentation
This paper introduces a new corpus, CoCoPops: The Coordinated Corpus of Popular Musics. The corpus can be considered a “meta corpus” in that it both extends and combines two existing corpora—the widely-used McGill Bill-
board corpus the and RS200 corpus. Both the McGill Billboard corpus and the RS200 contain expert harmonic annotations using different encoding schemes and each
represent harmony in fundamentally different ways: Billboard using a root-quality representation and the RS200 using Roman numerals. By combining these corpora
into a unified format, using the well-known kern andharm representations, we aim to facilitate research in computational musicology, which is frequently burdened
by corpora spread across multiple encoding formats. The format will also facilitate cross-corpus comparison with the large body of existing works in **kern format. For a
100-song subset of the CoCoPops-Billboard collection, we also provide participant ratings of continuous valence and arousal ratings, along with the RMS (Root Mean Square) signal level and associated timestamps. In this paper we describe the corpus and the procedures used to create it.
If the video does not load properly please use the direct link to video