P5-07: ScorePerformer: Expressive Piano Performance Rendering With Fine-Grained Control
Ilya Borovik (Skolkovo Institute of Science and Technology)*, Vladimir Viro (Peachnote)
Subjects (starting with primary): MIR tasks -> alignment, synchronization, and score following ; MIR fundamentals and methodology -> symbolic music processing ; Knowledge-driven approaches to MIR -> machine learning/artificial intelligence for music ; Musical features and properties -> representations of music ; Musical features and properties -> expression and performative aspects of music
Presented In Person: 4-minute short-format presentation
We present ScorePerformer, an encoder-decoder transformer with hierarchical style encoding heads for controllable rendering of expressive piano music performances. We design a tokenized representation of symbolic score and performance music, the Score Performance Music tuple (SPMuple), and validate a novel way to encode the local performance tempo in a local note time window. Along with the encoding, we extend a transformer encoder with multi-level maximum mean discrepancy variational autoencoder style modeling heads that learn performance style at the global, bar, beat, and onset levels for fine-grained performance control. To offer an interpretation of the learned latent spaces, we introduce performance direction marking classifiers that associate vectors in the latent space with direction markings to guide performance rendering through the model. Evaluation results show the importance of the architectural design choices and demonstrate that ScorePerformer produces diverse and coherent piano performances that follow the control input.
If the video does not load properly please use the direct link to video