P1-11: A Few-Shot Neural Approach for Layout Analysis of Music Score Images
Francisco J. Castellanos (University of Alicante)*, Antonio Javier Gallego (Universidad de Alicante), Ichiro Fujinaga (McGill University)
Subjects (starting with primary): MIR tasks -> optical music recognition ; Knowledge-driven approaches to MIR -> machine learning/artificial intelligence for music
Presented In Person: 4-minute short-format presentation
Optical Music Recognition (OMR) is a well-established research field focused on the task of reading musical notation from images of music scores. In the standard OMR workflow, layout analysis is a critical component for identifying relevant parts of the image, such as staff lines, text, or notes. State-of-the-art approaches to this task are based on machine learning, which entails having to label a training corpus, an error-prone, laborious, and expensive task that must be performed by experts. In this paper, we propose a novel few-shot strategy for building robust models by utilizing only partial annotations, therefore requiring minimal human effort. Specifically, we introduce a masking layer and an oversampling technique to train models using a small set of annotated patches from the training images. Our proposal enables achieving high performance even with scarce training data, as demonstrated by experiments on four benchmark datasets. The results indicate that this approach achieves performance values comparable to models trained with a fully annotated corpus, but, in this case, requiring the annotation of only between 20% and 39% of this data.
If the video does not load properly please use the direct link to video