News

Centre for Multimodal AI at ICASSP 2026

20 April 2026

On 4-8 May 2026, several CMAI researchers will participate at the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026). ICASSP is the leading conference in the field of signal processing and the flagship event of the IEEE Signal Processing Society.

As in previous years, the Centre for Multimodal AI will have a strong presence at the conference, both in terms of numbers and overall impact. The below papers authored or co-authored by CMAI members will be presented at the main ICASSP 2026 track:

Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension, by Yik Lung Pang, Changjae Oh
Consistency-aware learning for unbiased visual question answer, by Xinyu Jiang, Qiang Lu, Liang Zhao, Yunfei Long, Zhenfang Zhu, Jianyong Chai
Making Dialogue Grounding Data Rich: A Three-Tier Data Synthesis Framework for Generalized Referring Expression Comprehension, by Juexi Shao, Siyou Li, Yujian Gan, Chris Madge, Vanja Karan, Massimo Poesio
RAVE: Retrieval and Scoring Aware Verifiable Claim Detection, by Yufeng Li, Arkaitz Zubiaga
Diffusion Timbre Transfer Via Mutual Information Guided Inpainting, by Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas
Towards Effective Negation Modeling in Joint Audio-Text Models for Music, by Yannis Vasilakis, Rachel Bittner, Johan Pauwels
Domain-Invariant Representation Learning of Bird Sounds, by Ilyass Moummad, Romain Serizel, Emmanouil Benetos, Nicolas Farrugia
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs, by Brandon Carone, Iran Roman, Pablo Ripollés
Beat and Downbeat Detection: A Reformulated Approach, by James Bolt, Johan Pauwels, George Fazekas
Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model, by Minhui Lu, Joshua D. Reiss
Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation, by Aditya Bhattacharjee, Marco Pasini, Emmanouil Benetos
Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver, by Ivan Shanin, Xavier Riley, Simon Dixon

The following papers which have been published at IEEE or EURASIP journals will also be presented at the conference:

Neural Audio Synthesis for Sound Effects: A Scope Review, by Mateo Cámara, Fernando Marcos, Anders Bargum, Cuhmur Erkut, Joshua Reiss, José Luis Blanco
Published in the IEEE Transactions on Audio, Speech and Language Processing
Domain Adaptation of Few-Shot Bioacoustic Event Detection in Different Environments, by Yizhou Tan, Haojun Ai, Shengchen Li, György Fazekas
Published in the IEEE Transactions on Audio, Speech and Language Processing
Parameter optimisation for a physical model of the vocal system, by Mateo Cámara, José Luis Blanco, Joshua D. Reiss
Published in the EURASIP Journal on Audio, Speech, and Music Processing
Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities, by Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
Published in the IEEE Transactions on Audio, Speech and Language Processing
Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance, by Hyon Kim, Emmanouil Benetos, Xavier Serra
Published in the IEEE Signal Processing Letters

See you in Barcelona!

People: Changjae OH Yunfei LONG Massimo POESIO Arkaitz ZUBIAGA George FAZEKAS Johan PAUWELS Emmanouil BENETOS Iran ROMAN Josh REISS Simon DIXON

Updated by: Emmanouil Benetos