LP-35: AUTOMATIC TRANSCRIPTION OF MULTI-INSTRUMENTAL SONGS: INTEGRATING DEMIXING, HARMONIC DILATED CONVOLUTION, AND JOINT BEAT TRACKING

Monstad, Lars L*, Lartillot, Olivier

Abstract: In the rapidly expanding field of music information retrieval (MIR), automatic transcription remains one of the most sought-after capabilities, especially for songs that employ multiple instruments. Musscribe emerges as a state-of-the-art transcription tool that addresses this challenge by integrating three distinct methodologies: demixing, harmonic dilated convolution, and joint beat tracking. Demixing is employed to isolate individual instruments within a song by separating overlapping audio sources, thus ensuring each instrument is transcribed distinctly. Beat tracking is then run as a parallel process to extract the joint beat and downbeat estimations. These processes results in an output midi file, which is then quantized using information derived from the beat tracking. As such, this method paves the way for more accurate and sophisticated analyses, bridging the gap between human and machine understanding of music. Together, these methodologies allow us to produce transcriptions that are not only accurate but also highly representative of the original compositions. Preliminary tests and evaluations showcase the potential in transcribing complex musical pieces with high fidelity, outperforming many contemporary tools in the market. This innovative approach not only has implications for music transcription but also for broader applications in audio analysis, remixing, and digital music production. The model has been instrumental in accelerating the composition process for several Norwegian television shows. Moreover, its efficacy can be observed in the Netflix series "A Storm for Christmas." Renowned composer Peter Baden harnessed this tool to enhance his workflow, proving the demand for innovative tools like this in the professional music industry.