News
Centre for Multimodal AI at ICASSP 2026
Centre for Multimodal AI20 April 2026
On 4-8 May 2026, several CMAI researchers will participate at the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026). ICASSP is the leading conference in the field of signal processing and the flagship event of the IEEE Signal Processing Society.
As in previous years, the Centre for Multimodal AI will have a strong presence at the conference, both in terms of numbers and overall impact. The below papers authored or co-authored by CMAI members will be presented at the main ICASSP 2026 track:
- Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension, by Yik Lung Pang, Changjae Oh
- Consistency-aware learning for unbiased visual question answer, by Xinyu Jiang, Qiang Lu, Liang Zhao, Yunfei Long, Zhenfang Zhu, Jianyong Chai
- Making Dialogue Grounding Data Rich: A Three-Tier Data Synthesis Framework for Generalized Referring Expression Comprehension, by Juexi Shao, Siyou Li, Yujian Gan, Chris Madge, Vanja Karan, Massimo Poesio
- RAVE: Retrieval and Scoring Aware Verifiable Claim Detection, by Yufeng Li, Arkaitz Zubiaga
- Diffusion Timbre Transfer Via Mutual Information Guided Inpainting, by Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas
- Towards Effective Negation Modeling in Joint Audio-Text Models for Music, by Yannis Vasilakis, Rachel Bittner, Johan Pauwels
- Domain-Invariant Representation Learning of Bird Sounds, by Ilyass Moummad, Romain Serizel, Emmanouil Benetos, Nicolas Farrugia
- The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs, by Brandon Carone, Iran Roman, Pablo Ripollés
- Beat and Downbeat Detection: A Reformulated Approach, by James Bolt, Johan Pauwels, George Fazekas
- Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model, by Minhui Lu, Joshua D. Reiss
- Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation, by Aditya Bhattacharjee, Marco Pasini, Emmanouil Benetos
- Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver, by Ivan Shanin, Xavier Riley, Simon Dixon
The following papers which have been published at IEEE or EURASIP journals will also be presented at the conference:
- Neural Audio Synthesis for Sound Effects: A Scope Review, by Mateo Cámara, Fernando Marcos, Anders Bargum, Cuhmur Erkut, Joshua Reiss, José Luis Blanco
Published in the IEEE Transactions on Audio, Speech and Language Processing - Domain Adaptation of Few-Shot Bioacoustic Event Detection in Different Environments, by Yizhou Tan, Haojun Ai, Shengchen Li, György Fazekas
Published in the IEEE Transactions on Audio, Speech and Language Processing - Parameter optimisation for a physical model of the vocal system, by Mateo Cámara, José Luis Blanco, Joshua D. Reiss
Published in the EURASIP Journal on Audio, Speech, and Music Processing - Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities, by Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
Published in the IEEE Transactions on Audio, Speech and Language Processing - Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance, by Hyon Kim, Emmanouil Benetos, Xavier Serra
Published in the IEEE Signal Processing Letters
See you in Barcelona!
People: Changjae OH Yunfei LONG Massimo POESIO Arkaitz ZUBIAGA George FAZEKAS Johan PAUWELS Emmanouil BENETOS Iran ROMAN Josh REISS Simon DIXON
Updated by: Emmanouil Benetos
