Syndicate content

PhD INPhINIT “la Caixa” fellowship on Audio-visual approaches for music information retrieval

-Research Project:

Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal methods for content-based semantic description of music material. In this project, we research on the complementarity of audio and image/video description algorithms for the automatic description and indexing of user-generated music performance videos. We address relevant music information research tasks, in particular music instrument recognition, music transcription, synchronization of audio / video streams, similarity, quality assessment, structural analysis and segmentation and automatic video mashup generation. In order to do so, we develop strategies to build multimedia repositories and gather human annotations.

Research topics: music information retrieval, music transcription, automatic classification, image processing, machine learning.

This research is related to the Maria de Maeztu Strategic Program on Data-Driven Knowledge Extraction ( Our project deals with large-scale multimedia data. You can watch a video presenting our project here (


O. Slizovskaia, E. Gómez, G. Haro. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. 13th Sound and Music Computing Conference, SMC 2016.

O. Slizovskaia, E. Gómez, G. Haro. Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture. ACM International Conference on Multimedia Retrieval, ICMR 2017

P. Zinemanas, P. Arias, G. Haro, E. Gómez. Visual music transcription of clarinet video recordings trained with audio-based labelled data. ICCV  2017 Workshop on Computer Vision for Audio-Visual Media (CVAVM), 2017


-Job position description:

This research project will be supervised by Dr. Emilia Gómez and Dr. Gloria Haro, permanent faculty members, respectively, of Music Information Research Lab at the the Music Technology Group (MTG) and Image Processing Group (IPG) at Universitat Pompeu Fabra, Barcelona.

The PhD position is offered to a highly motivated researcher, with a strong academic record with solid background in software development, and experience in audio processing or computer vision. Good programming skills are expected, preferably in C/C++ and MATLAB/Python. Knowledge of deep-learning frameworks is a plus.