LP-5: AutoOsu: Audio-Aware Action Generation for Rhythm Games

Lee, Sihun*, Jeong, Dasaem

Abstract: Rhythm-based video games challenge players to match their actions with musical cues, turning songs into interactive experiences. The design of the game charts, which dictate the timing and placement of on-screen notes, are manually crafted by players and developers. With AutoOsu, we introduce a CRNN-based model for generating rhythm game charts for a given audio track, conditioned on an intended difficulty level. In previous studies, this task is often divided into two: onset detection, which determines timing points for notes; and action generation, where notes are distributed among a set of available keys. These two sub-tasks are typically handled with two separately trained models, and audio information is only given to the onset detection model. We instead jointly train the two recurrent layers who both receive audio information, which streamlines the training process and helps better utilize musical features.