P1-02: TriAD: Capturing Harmonics With 3D Convolutions
Miguel Perez Fernandez (Universitat Pompeu Fabra, Huawei)*, Holger Kirchhoff (Huawei), Xavier Serra (Universitat Pompeu Fabra )
Subjects (starting with primary): MIR tasks -> pattern matching and detection ; Knowledge-driven approaches to MIR -> representations of music ; Musical features and properties -> representations of music ; Knowledge-driven approaches to MIR -> machine learning/artificial intelligence for music ; MIR fundamentals and methodology -> music signal processing ; MIR tasks -> music transcription and annotation
Presented In Person: 4-minute short-format presentation
Thanks to advancements in deep learning (DL), automatic music transcription (AMT) systems recently outperformed previous ones fully based on manual feature design. Many of these highly capable DL models, however, are computationally expensive. Researchers are moving towards smaller models capable of maintaining state-of-the-art (SOTA) results by embedding musical knowledge in the network architecture. Existing approaches employ convolutional blocks specifically designed to capture the harmonic structure. These approaches, however, require either large kernels or multiple kernels, with each kernel aiming to capture a different harmonic. We present TriAD, a convolutional block that achieves an unequally distanced dilation over the frequency axis. This allows our method to capture multiple harmonics with a single yet small kernel. We compare TriAD with other methods of capturing harmonics, and we observe that our approach maintains SOTA results while reducing the number of parameters required. We also conduct an ablation study showing that our proposed method effectively relies on harmonic information.
If the video does not load properly please use the direct link to video