LP-27: Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Cífka, Ondřej*, Dimitriou, Constantinos, Wang, Cheng-i, Schreiber, Hendrik, Miner, Luke, Stöter, Fabian-Robert

Abstract: In this paper, we discuss the evaluation for the automatic lyric transcription (ALT) task. We argue that existing lyric transcription benchmarks, primarily focusing on word content, may overlook the complex nuances of written lyrics. This leads to potential misalignment between the creative process of musicians and songwriters as well as listeners' experiences. For example, the absence of line breaks can strip the lyrics of their original rhythm, emotional emphasis, and rhyme scheme. To address this issue, we introduce JamALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. This includes an enhanced version of the data, geared specifically towards ALT evaluation by implementing music industry's transcription and formatting guidelines for lyrics, particularly in terms of punctuation, line breaks, letter case, and non-word vocal sounds. We also propose a suite of evaluation metrics beyond traditional word error rate, which are designed to capture the aforementioned issues. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.