LP-41: Optimizing the Mridangam Stroke Transcription Pipeline: Addressing Key Challenges

Krishnan, Gopika*, Ganguli, Kaustuv, Guedes, Carlos

Abstract: In this study, we examined different facets of the mridangam stroke transcription pipeline. We employed datasets comprising of mridangam strokes, some artificially generated through simulations using a browser application in which the strokes can be generated by typing the syllables on a text window, and others recorded authentically. We investigate the effectiveness of clustering techniques on the task of stroke classification. Additionally, pre-trained residual neural networks (ResNet) were also examined by fine-tuning them on the task. Our initial findings, supported by existing literature, underscore the necessity of introducing additional steps to address composite strokes (strokes created by combining two basic strokes) and accommodate full-length recordings of the mridangam. We also delved into the task of audio segmentation using onset detection techniques as part of our broader exploration within this context. Our aim was to segment mridangam recordings into individual strokes. Our preliminary finding suggests that utilizing spectral flux for onset detection, coupled with a post-processing step involving a local average computation step, produced promising results.