Decoding Drums, Instrumentals, Vocals, and Mixed Sources in Music Using Human Brain Activity With fMRI

Vincent K.M. Cheung (Sony Computer Science Laboratories, Inc.)*; Lana Okuma (RIKEN); Kazuhisa Shibata (RIKEN); Kosetsu Tsukuda (National Institute of Advanced Industrial Science and Technology (AIST)); Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST)); Shinichi Furuya (Sony Computer Science Laboratories Inc.)

P2-06: Decoding Drums, Instrumentals, Vocals, and Mixed Sources in Music Using Human Brain Activity With fMRI

Vincent K.M. Cheung (Sony Computer Science Laboratories, Inc.)*, Lana Okuma (RIKEN), Kazuhisa Shibata (RIKEN), Kosetsu Tsukuda (National Institute of Advanced Industrial Science and Technology (AIST)), Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST)), Shinichi Furuya (Sony Computer Science Laboratories Inc.)

Subjects (starting with primary): Human-centered MIR -> user behavior analysis and mining, user modeling ; Knowledge-driven approaches to MIR -> cognitive MIR ; Human-centered MIR -> human-computer interaction ; Human-centered MIR -> user-centered evaluation

Presented In Person: 4-minute short-format presentation

Abstract:

Brain decoding allows the read-out of stimulus and mental content from neural activity, and has been utilised in various neural-driven classification tasks related to the music information retrieval community. However, even the relatively simple task of instrument classification has only been demonstrated for single- or few-note stimuli when decoding from neural data recorded using functional magnetic resonance imaging (fMRI). Here, we show that drums, instrumentals, vocals, and mixed sources of naturalistic musical stimuli can be decoded from single-trial spatial patterns of auditory cortex activation as recorded using fMRI. Comparing classification based on convolutional neural networks (CNN), random forests (RF), and support vector machines (SVM) further revealed similar neural encoding of vocals and mixed sources, despite vocals being most easily identifiable. These results highlight the prominence of vocal information during music perception, and illustrate the potential of using neural representations towards evaluating music source separation performance and informing future algorithm design.

If the video does not load properly please use the direct link to video