QMUL Centre for Multimodal AI News

Queen Mary hosts inaugural event of new London Interdisciplinary Music Research Initiative

Wed, 29 Apr 2026 23:00:00 +0100

Yesterday (29th April 2026), the Centre for Digital Music (C4DM), part of the School of Electronic Engineering and Computer Science, hosted the inaugural event of the London Interdisciplinary Music Research Initiative (LIMRI) - bringing together leading researchers and practitioners from across London to explore the science, scholarship, and art of expert musical performance. The afternoon workshop, titled Interdisciplinary Conversations on Expert Performance, took place at Queen Mary University of London's Mile End Campus and featured four invited speakers from King's College London, Imperial College London, City St George's University of London, and the Royal College of Music. LIMRI is a newly launched cross-London network designed to foster collaboration and dialogue between researchers working across music, technology, science and the arts. The initiative is co-led by Dr Charalampos Saitis, Lecturer in Digital Music Processing alongside colleagues from Goldsmiths, Kingston University London, and King's College London. LIMRI was formally launched in December 2025. Yesterday's workshop was the first in a series of themed research events that LIMRI plans to host at different institutions across London. Further details about the initiative can be found on the LIMRI website.

Centre for Multimodal AI at ICASSP 2026

Sun, 19 Apr 2026 23:00:00 +0100

On 4-8 May 2026, several CMAI researchers will participate at the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026). ICASSP is the leading conference in the field of signal processing and the flagship event of the IEEE Signal Processing Society. As in previous years, the Centre for Multimodal AI will have a strong presence at the conference, both in terms of numbers and overall impact. The below papers authored or co-authored by CMAI members will be presented at the main ICASSP 2026 track: Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension, by Yik Lung Pang, Changjae Oh Consistency-aware learning for unbiased visual question answer, by Xinyu Jiang, Qiang Lu, Liang Zhao, Yunfei Long, Zhenfang Zhu, Jianyong Chai Making Dialogue Grounding Data Rich: A Three-Tier Data Synthesis Framework for Generalized Referring Expression Comprehension, by Juexi Shao, Siyou Li, Yujian Gan, Chris Madge, Vanja Karan, Massimo Poesio RAVE: Retrieval and Scoring Aware Verifiable Claim Detection, by Yufeng Li, Arkaitz Zubiaga Diffusion Timbre Transfer Via Mutual Information Guided Inpainting, by Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas Towards Effective Negation Modeling in Joint Audio-Text Models for Music, by Yannis Vasilakis, Rachel Bittner, Johan Pauwels Domain-Invariant Representation Learning of Bird Sounds, by Ilyass Moummad, Romain Serizel, Emmanouil Benetos, Nicolas Farrugia The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs, by Brandon Carone, Iran Roman, Pablo Ripollés Beat and Downbeat Detection: A Reformulated Approach, by James Bolt, Johan Pauwels, George Fazekas Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model, by Minhui Lu, Joshua D. Reiss Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation, by Aditya Bhattacharjee, Marco Pasini, Emmanouil Benetos Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver, by Ivan Shanin, Xavier Riley, Simon Dixon The following papers which have been published at IEEE or EURASIP journals will also be presented at the conference: Neural Audio Synthesis for Sound Effects: A Scope Review, by Mateo Cámara, Fernando Marcos, Anders Bargum, Cuhmur Erkut, Joshua Reiss, José Luis Blanco Published in the IEEE Transactions on Audio, Speech and Language Processing Domain Adaptation of Few-Shot Bioacoustic Event Detection in Different Environments, by Yizhou Tan, Haojun Ai, Shengchen Li, György Fazekas Published in the IEEE Transactions on Audio, Speech and Language Processing Parameter optimisation for a physical model of the vocal system, by Mateo Cámara, José Luis Blanco, Joshua D. Reiss Published in the EURASIP Journal on Audio, Speech, and Music Processing Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities, by Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos Published in the IEEE Transactions on Audio, Speech and Language Processing Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance, by Hyon Kim, Emmanouil Benetos, Xavier Serra Published in the IEEE Signal Processing Letters See you in Barcelona!

Centre for Multimodal AI at ICLR 2026

Wed, 08 Apr 2026 23:00:00 +0100

On 23-27 April, CMAI researchers will participate at the Fourteenth International Conference on Learning Representations (ICLR 2026), taking place in Rio de Janeiro, Brazil. ICLR is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. The following papers authored or coauthored by CMAI papers will be presented at the main track of ICLR 2026: Spectral Attention Steering for Prompt Highlighting, by Weixian Waylon Li, Yuchen Niu, Yongxin Yang, Keshuang Li, Tiejun Ma, Shay B Cohen SCRAPL: scattering transform with random paths for machine learning, by Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos, Mathieu Lagrange ViMo: A Generative Visual GUI World Model for App Agents, by Dezhao Luo, Bohan Tang, Kang Li, Georgios Papoudakis, Jifei Song, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao Beyond Linear Probes: Dynamic Safety Monitoring for Language Models, by James Oldfield, Philip Torr, Ioannis Patras, Adel Bibi, Fazl Barez CASteer: Cross-Attention Steering for Controllable Concept Erasure, by Tatiana Gaintseva, Andreea-Maria Oncescu, Chengcheng Ma, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi OmniVideoBench: towards audio-visual understanding evaluation for omni MLLMs, by Caorui Li, Yu Chen, Yiyan Ji, Jin Xu, Zhenyu Cui, Shihao Li, Yuanxing Zhang, Zhenghao Song, Dingling Zhang, Heying, Haoxiang Liu, Yuxuan Wang, Qiufeng Wang, Jiafu Tang, Zhenhe Wu, Jiehui Luo, Zhiyu Pan, Weihao Xie, Chenchen Zhang, Zhaohui Wang, Jiayi Tian, Yanghai Wang, Zhe Cao, Minxin Dai, ke wang, Runzhe Wen, Yinghao Ma, Yaning Pan, Sungkyun Chang, Termeh Taheri, Haiwen Xia, Christos Plachouras, Emmanouil Benetos, Yizhi Li, Ge Zhang, Jian Yang, Tianhao Peng, Zili Wang, Minghao Liu, Junran Peng, Zhaoxiang Zhang, Jiaheng Liu YuE: scaling open foundation models for long-form music generation, by Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xeron Du, Zhen Ye, Tianyu Zheng, Zhengxuan Jiang, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang, Yatian Wang, Xiaowei Chi, Xinyue Zhang, Zhenzhu Yang, XiangzhouWang, Shansong Liu, Lingrui Mei, Peng Li, Junjie Wang, Jianwei Yu, Guojian Pang, Xu Li, Zihao Wang, Xiaohuan Zhou, Lijun Yu, Emmanouil Benetos, Yong Chen, Chenghua Lin, Xie Chen, Gus Xia, Zhaoxiang Zhang, Chao Zhang, Wenhu Chen, Xinyu Zhou, Xipeng Qiu, Roger Dannenberg, Jiaheng Liu, Jian Yang, Wenhao Huang, Wei Xue, Xu Tan, Yike Guo The following paper authored by CMAI members will be presented at the ICLR 2026 Workshop on Lifelong Agents: Beyond syntax: Action semantics learning for app agents, by Bohan Tang, Dezhao Luo, Jianheng Liu, Jingxuan Chen, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao See you all at ICLR!

CMAI PhD Student Completes Research Fellowship at UK Parliament

Sun, 29 Mar 2026 23:00:00 +0100

CMAI PhD student Alexander Williams recently completed a 3 month research fellowship at the Parliamentary Office of Science and Technology (POST) following a successful application to the UKRI Policy Internship Scheme. POST is an impartial research and knowledge exchange service based in the UK Parliament. They work to ensure cutting-edge research evidence and expertise is available to members of both houses of parliament (House of Commons and House of Lords), covering emerging and complex science and social science topics. During the fellowship, Alex worked closely with POST's Physical Sciences and Digital Lead, Simon Brawley, to research and write a POSTnote—an impartial, accurate, and peer-reviewed briefing tailored for UK parliamentarians—on data centres and their sustainability. Data centres are crucial infrastructure that underpin many aspects of modern life, including artificial intelligence. The POSTnote, titled What are data centres and how sustainable are they?, discusses what data centres are, their presence in the UK, and their impact on different aspects of sustainability. This briefing was produced in consultation with experts and stakeholders from academia, industry, government, and beyond, including interviews with Google, techUK and academics from University of Oxford, Loughborough University, and University of Manchester. The POSTnote can be read in full here. Well done Alex!

Reimagining music videos with AI: CMAI research breaks new ground

Tue, 06 Jan 2026 00:00:00 +0100

Yinghao Ma, a PhD candidate in the Centre for Multimodal AI at Queen Mary University of London, has helped develop AutoMV, the first open-source AI system capable of generating complete music videos directly from full-length songs. Music-to-video generation remains a major challenge for generative AI. While recent video models can produce visually impressive short clips, they often struggle with long-form storytelling, musical alignment, and character consistency. AutoMV addresses these limitations by introducing a multi-agent AI system designed specifically for full-length music video production. Developed through a collaboration between Queen Mary researchers and partners at Beijing University of Posts and Telecommunications, Nanjing University, Hong Kong University of Science and Technology, and the University of Manchester, AutoMV brings together expertise in music information retrieval, multimodal AI, and creative computing. The work was led by Dr Emmanouil Benetos, with contributions from Yinghao Ma as well as Dr. Changjae Oh and Chaoran Zhu from the Centre for Intelligent Sensing. AutoMV works like a virtual film production team. First, it analyses a song's musical structure, beats, and time-aligned lyrics. Then, a set of specialised AI agents—taking on roles such as screenwriter, director, and editor—collaborate to plan scenes, maintain character identity, and generate images and video clips. A final quality-control "verifier" agent checks for coherence and consistency, regenerating content where needed. This approach allows AutoMV to produce music videos that follow a song from beginning to end, maintaining narrative flow and visual identity throughout. Human expert evaluations show that AutoMV significantly outperforms existing commercial tools, narrowing the gap between AI-generated videos and professionally produced music videos. By lowering the cost of music video production from tens of thousands of pounds to roughly the cost of an API call, AutoMV has the potential to empower independent musicians, educators, and creators who previously lacked access to professional video production. As an open-source project, it also supports transparent, reproducible research and encourages community collaboration. The team is actively inviting researchers and students to contribute to the codebase, extend the benchmark, and explore future directions for long-form, multimodal AI systems. Code: https://github.com/multimodal-art-projection/AutoMV Paper: https://arxiv.org/abs/2512.12196 Project website: https://m-a-p.ai/AutoMV/

Women in Higher Education Network plus grant

Thu, 01 Jan 2026 00:00:00 +0100

Ekaterina Ivanova and Anna Xambo Sedo have been awarded a grant from the QMUL Erica fund worth £13,200 to support the Women in Higher Education Network plus (WHEN+), until July 2027. The Women in Higher Education Network (WHEN) was founded in 2023 at EECS with the aim of building a strong and sustainable community for individuals identifying as women, while contributing to advancing diversity, equity, and inclusion in STEM. Over the past two years, under the leadership of Ekaterina Ivanova and Anna Xambó Sedó, WHEN has established a solid foundation for community engagement through a dedicated website, mailing list, and LinkedIn group. The network has grown from 38 active participants in its first year to 88 in the second, and now exceeds 100 registered members with the inclusion of SBBS and collaboration with ITS Women in Tech. To date, WHEN has delivered 30 monthly events, including workshops, talks, and social gatherings. These activities have received consistently positive feedback (July 2024 survey) and have fostered strong peer support, with members actively proposing new initiatives. With this new extension funding from ERICA, they aim to increase the project's impact through widening inclusion by collaborating with other underrepresented groups, including LGBTQIA+ communities; expanding participation across the Faculty of Engineering; and extending the network geographically from QMUL to the wider London academic community, supporting interdisciplinary exchange. Following their already established series of regular but distinctive events, the monthly activities will include social events and skill development events. Social events include coffee mornings, cinema screenings with discussion, tea parties, social lunches, sports (e.g. yoga, zumba), and so on. Skill development events include group coaching, presentations, workshops, meetups, and any relevant activity/idea proposed by and/or powered by network members. See the WHEN website for past activities:

CMAI at NeurIPS 2025

Mon, 17 Nov 2025 00:00:00 +0100

On 2-7 December, several CMAI researchers will participate at the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025), taking place in San Diego. NeurIPS is a prestigious annual academic conference and non-profit foundation that fosters the exchange of research in artificial intelligence (AI), machine learning (ML), and computational neuroscience. CMAI members will be presenting the following papers at the main track of NeurIPS 2025: Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders James Oldfield, Shawn Im, Sharon Li, Mihalis Nicolaou, Ioannis Patras, Grigorios Chrysos https://openreview.net/forum?id=jcvX8XFNqX Compress & Cache: Vision token compression for efficient generation and retrieval Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos https://openreview.net/forum?id=nGEq3D6FFX ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay, Zhensong Zhang, Gregory Slabaugh, Eduardo Pérez-Pellitero https://openreview.net/forum?id=mLVqiNH0aA Large language models can learn and generalize steganographic chain-of-thought under process supervision Robert McCarthy, Joey SKAF, Luis Ibanez-Lissen, Vasil Georgiev, Connor Watts, Hannes Whittingham, Lorena Gonzalez-Manzano, Cameron Tice, Edward James Young, Puria Radmard, David Lindner https://openreview.net/forum?id=2g5cJqX15Y Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Video Temporal Grounding Jian Hu, Zixu Cheng, Shaogang Gong, Isabel Guan, Jianye HAO, Jun Wang, Kun Shao https://openreview.net/forum?id=RfNiN2rENM CALM: Culturally Self-Aware Language Models Lingzhi Shen, Xiaohao Cai, YUNFEI LONG, Imran Razzak, Guanming Chen, Shoaib Jameel https://openreview.net/forum?id=16QYhVFvrO λ-Orthogonality Regularization for Compatible Representation Learning Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, Alberto Del Bimbo https://openreview.net/forum?id=Due3iZPa6u The following papers will be presented at the Datasets and Benchmarks track of NeurIPS 2025: OmniBench: Towards The Future of Universal Omni-Language Models Yizhi LI, Ge Zhang, Yinghao Ma, Ruibin Yuan, King Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Moore Wang, Jian Yang, Siwei Wu, Xingwei Qu, Jinjie Shi, Xinyue Zhang, Zhenzhu Yang, Yidan WEN, Yanghai Wang, Shihao Li, Zhaoxiang Zhang, Ruibo Liu, Emmanouil Benetos, Wenhao Huang, Chenghua Lin https://openreview.net/forum?id=SSF4qgsNYE MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, tianrui wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu, Ruibin Yuan, Zhisheng Zheng, Ziya Zhou, Haina Zhu, Wei Xue, Emmanouil Benetos, Kai Yu, EngSiong Chng, Xie Chen https://openreview.net/forum?id=fgmrBJemlQ XIFBench: Evaluating Large Language Models on Multilingual Instruction Following Zhenyu Li, Kehai Chen, YUNFEI LONG, Xuefeng Bai, Yaoyin Zhang, Xuchen Wei, Juntao Li, Min Zhang https://openreview.net/forum?id=qkdVjCAPOE The following paper will be presented at the Creative AI track of NeurIPS 2025: The Ghost in the Keys: A Disklavier Demo for Human-AI Musical Co-Creativity Louis Bradshaw, Alexander Spangher, Stella Biderman, Simon Colton https://openreview.net/forum?id=3yeBer3J5z See you all at NeurIPS!

CMAI PhD student awarded Google PhD Fellowship

Wed, 22 Oct 2025 23:00:00 +0100

We are extremely proud to announce that Yinghao Ma, PhD student in AI and Music at the Centre for Multimodal AI of QMUL and supervised by Dr Emmanouil Benetos, has been awarded the 2025 Google Fellowship in Machine Perception. A Google spokesperson said: "The student nominations we received this year were exemplary in their quality, but Yinghao especially stood out and was endorsed by the research scientists and distinguished engineers within Google who participated in the review. Congratulations to Yinghao on this well-deserved recognition, it's an honor to support such incredibly talented students." Yinghao's PhD research focuses on advancing Large Language Models (LLMs) for music understanding and generation. Specifically, he studies how multimodal models can integrate audio, symbolic, and textual information to understand, reason about, and generate music. Together with colleagues, he developed MERT, a large-scale music audio representation model which has more than 10k monthly download in the past three years. His recent work includes developing music instruction-following datasets and benchmarks that help evaluate how well AI systems can comprehend and create music. He said: "It's my great honour to receive the Google PhD Fellowship that recognises my research and strongly contribute to my future career. I'm deeply grateful to Google and QMUL for the support, providing good platforms for AI & music research." Congratulations Yinghao!

From biodiversity to artificial intelligence

Wed, 15 Oct 2025 23:00:00 +0100

We showcase the research work of Kabiru Abubakari, a PhD student in the Centre for Probability, Statistics and Data Science, and the research work of Dr David Mguni, a lecturer in the Centre for Multimodal AI. Kabiru Abubakari Kabiru's research focuses on Bayesian spatial modelling for biodiversity. His PhD project is devoted to developing and applying Bayesian spatial and spatio-temporal modelling techniques to enhance understanding of the association between plant species at risk of extinction and areas in need of protection in the face of climate change, changing land use (especially agriculture), and pollution. Working together with his supervisors — Prof Silvia Liverani (SMS), Prof Andrew Leitch (SBBS), and Dr Ilia Leitch (Royal Botanic Garden, Kew) — Kabiru combines statistical modelling and ecology to develop methods that better capture uncertainty in biodiversity data. His academic journey began with a degree in Economics at the University for Development Studies (UDS) in Tamale, Ghana, where he graduated in 2020. Since joining Queen Mary, Kabiru has also been very active in supporting students of Black heritage as a tutor in Levelling Up Maths and as a panellist at the Black Heroes of Mathematics Conference. Read more about Kabiru's research in this poster. David Mguni David is a Lecturer in Artificial Intelligence. His research spans reinforcement learning, game theory, and optimal control, with a focus on developing self-improving, cooperative learning systems. His work contributes to a broader vision of building AI that can reason, adapt, and learn autonomously in an open-ended world. Together with his PhD student Yaqi Sun and master's students, David is working towards one of the grand goals of artificial intelligence: creating systems that can not only learn from existing training data but also learn how to learn and invent their own challenges. The group's research on the Recursive Meta-Learning Framework explores how intelligent systems can evolve their own learning rules and generate and solve new problems that push them beyond the limits of human-derived data. A central focus of the group's work is reinforcement learning — particularly understanding how multiple intelligent systems can cooperate, compete, and coordinate in open, dynamic environments. The group's research seeks to overcome the limitations of traditional reinforcement learning algorithms by enabling AI to learn the rules of learning itself. This approach has far-reaching implications. By allowing AI systems to invent new challenges, discover hidden structures, and maintain stability as they learn together, the research moves toward the long-term goal of artificial general intelligence: machines capable of generalising knowledge, adapting creatively, and cooperating safely across domains. Possible applications range from AI programs that autonomously generate novel mathematical proofs to agents that continually refine their understanding of molecular structures for drug discovery. The group's work blends theory with practical experimentation, drawing on dynamical systems, game theory, category theory, stochastic control, and variational optimisation. These mathematical foundations ensure that the learning mechanisms they develop are not only powerful and flexible but also grounded in principles that make them interpretable, stable, and safe. To learn more about David's research, read also here.

Best student paper and outstanding reviewer awards at ISMIR 2025

Sun, 05 Oct 2025 23:00:00 +0100

We are delighted to share that CMAI PhD student Ben Hayes, along with CMAI academics Charalampos Saitis and George Fazekas, have received the best student paper award at the ISMIR 2025 conference. The paper "Audio Synthesizer Inversion in Symmetric Parameter Spaces With Approximately Equivariant Flow Matching" proposes using permutation equivariant continuous normalizing flows to handle the ill-posed problem of audio synthesizer inversion, where multiple parameter configurations can produce identical sounds due to intrinsic symmetries in synthesizer design. By explicitly modeling these symmetries, particularly permutation invariance across repeated components like oscillators and filters, the method outperforms both regression-based approaches and symmetry-naive generative models on both synthetic tasks and a real-world synthesizer (Surge XT). We are also happy to share that two CMAI PhD students, Yannis Vasilakis and Ben Hayes, were recognised as outstanding reviewers.

CMAI at WASPAA 2025

Sun, 05 Oct 2025 23:00:00 +0100

On 12-15 October, several CMAI researchers will participate at the 2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, taking place at the Granlibakken Tahoe Resort near Lake Tahoe, in Tahoe City, CA, USA. WASPAA is a premier event in the field of audio signal processing, organised by the IEEE's Audio and Acoustic Signal Processing (AASP) technical committee, with a strong focus on music signal processing and computational sound scene analysis. The Centre for Multimodal AI, as in previous years, will have a strong presence at WASPAA 2025. In the Technical Programme, the following papers are authored by CMAI members: Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach (Adrian S. Roman, Iran R. Roman, Juan Pablo Bello) RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection (Sungkyun Chang, Simon Dixon, Emmanouil Benetos) Modulation Discovery with Differentiable Digital Signal Processing (Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss) Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks (Harnick Khera, Johan Pauwels, Alan W. Archer-Boyd, Mark B. Sandler) Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior (Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, George Fazekas) Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription (Mary Pilataki, Matthias Mauch, Simon Dixon) In the Demo Session, the following demos will be presented by C4DM members: Neural Audio Synthesis for Non-Keyboard Instruments (Franco Caspe, Andrew McPherson, Mark Sandler) PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space (Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, George Fazekas) See you at WASPAA!

Game AI group celebrates multiple successes at IEEE conference on games 2025

Wed, 10 Sep 2025 23:00:00 +0100

Queen Mary University of London's Game AI Group had a standout presence at the prestigious IEEE Conference on Games (IEEE CoG) 2025, held 26–29 August in Milan. The conference is one of the world's leading venues for research on video games, board games and game-related technologies. The group published an impressive five papers (four full and two short), with two full papers nominated for the Best Paper Award. The paper "Bootstrap Your Own Teacher: Online Policy Distillation for Multi-Game Reinforcement Learning" – led by Donal Byrne and co-authored by colleagues including Queen Mary's Marko Tot – went on to win the award, marking a major achievement for the team. Marko, an IGGI PhD student in the Game AI Group, carried out this work while on placement at Instadeep. The team also celebrated recognition beyond their own papers. Simon Lucas, Professor of Artificial Intelligence and head of the Game AI Group, was presented with an Outstanding Contribution Award for co-founding the IEEE Conference on Games in 2005 (originally the IEEE Symposium on Computational Intelligence and Games), and for founding the IEEE Transactions on Games journal. Contributions at IEEE CoG 2025: Full Papers JSON-Bag: A generic game trajectory representation – Dien Nguyen, Diego Perez Liebana and Simon Lucas (Best Paper Nomination) How Task Complexity Moderates the Impact of AI-Generated Images on User Experience in Gamified Text Labelling – Fatima Althani, Chris Madge and Massimo Poesio (Best Paper Nomination and Winner) Bootstrap Your Own Teacher: Online Policy Distillation for Multi-Game Reinforcement Learning – Donal Byrne, Marko Tot, Paul Duckworth, Clement Bonnet, Alexandre Laterre and Thomas Barrett Short Papers Play-Style Identification Using Low-Level Representations of Play Traces in MicroRTS – Ruizhe Yu Xia, Jeremy Gow and Simon Lucas Constraint Propagation for Reasoning in Single-player Deduction Games – Fandi Meng, Kaijie Xu and Simon Lucas Competitions PlanetWars AI Challenge – Simon Lucas Tabletop Games Balancing Competition – George Long These successes showcase the Queen Mary Game AI Group's international reputation at the forefront of artificial intelligence and games research, and their continuing influence in shaping the future of the field. And a big thank you to our IGGI CDT for their help and support with this.

CMAI at ISMIR 2025

Sun, 07 Sep 2025 23:00:00 +0100

On 21-25 September 2025, several CMAI researchers will participate at the 26th International Society for Music Information Retrieval Conference (ISMIR 2025). ISMIR is the leading conference in the field of music informatics, and is currently the top-cited publication for Music & Musicology (source: Google Scholar). This year ISMIR will take place onsite in Daejeon, Korea. Similar to previous years, the Centre for Multimodal AI will have a strong presence at ISMIR 2025. In the Scientific Programme, the following papers are authored/co-authored by CMAI members: Audio Synthesizer Inversion in Symmetric Parameter Spaces with Approximately Equivariant Flow Matching (Ben Hayes, Charalampos Saitis, György Fazekas) SLAP: Siamese Language Audio Pretraining without Negative Samples for Music Understanding (Julien Guinot, Alain Riou, Elio Quinton, György Fazekas) GD-Retriever: Controllable Generative Text Music Retrieval with Diffusion Models (Julien Guinot, Elio Quinton, György Fazekas) Instruct-MusicGen: Unlocking Text to Music Editing for Music Language Models via Instruction Tuning (Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez Ramírez, Liwei Lin, Gus Xia, Wei Hsiang Liao, Yuki Mitsufuji, Simon Dixon) Scaling Self Supervised Representation Learning for Symbolic Piano Performance (Louis Bradshaw, Honglu Fan, Alexander Spangher, Stella Biderman, Simon Colton) Codicodec: Unifying Continuous and Discrete Compressed Representations of Audio (Marco Pasini, Stefan Lattner, György Fazekas) MIDI-VALLE: Improving Expressive Piano Performance Synthesis through Neural Codec Language Modelling (Jingjing Tang, Xin Wang, Zhe Zhang, Junichi Yamagishi, Geraint Wiggins, György Fazekas) Universal Music Representations? Evaluating Foundation Models on World Music Corpora (Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos) Perceptual Errors in Music Source Separation: Looking Beyond SDR Averages (Saurjya Sarkar, Victoria Moomjian, Basil Woods, Emmanouil Benetos, Mark Sandler) GOAT: a Large Dataset of Paired Guitar Audio Recordings and Tablatures (Jackson Loth, Pedro Sarmento, Saurjya Sarkar, Zixun Guo, Mathieu Barthet, Mark Sandler) CMI-Bench: a Comprehensive Benchmark for Evaluating Music Instruction Following (Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa) Assessing the Alignment of Audio Representations with Timbre Similarity Ratings (Haokun Tian, Stefan Lattner, Charalampos Saitis) Improving Neural Pitch Estimation with SWIPE Kernels (David Marttila, Joshua D. Reiss) Refining Music Sample Identification with a Self Supervised Graph Neural Network (Aditya Bhattacharjee, Ivan Meresman Higgs, Mark Sandler, Emmanouil Benetos) The following Tutorials will be co-presented by CMAI PhD students Rodrigo Diaz and Julien Guinot: Differentiable Physical Modeling Sound Synthesis: Theory, Musical Application, and Programming (Jin Woo Lee, Stefan Bilbao, Rodrigo Diaz) Self-supervised Learning for Music - An Overview and New Horizons (Julien Guinot, Alain Riou, Yuexuan Kong, Marco Pasini, Gabriel Meseguer-Brocal, Stefan Lattner) The following journal papers published at TISMIR which are co-authored by CMAI members will be presented at the conference: Predicting Eurovision Song Contest Results: A Hit Song Science Approach (Katarzyna Adamska, Joshua Reiss) The GigaMIDI Dataset with Features for Expressive Music Performance Detection (Keon Ju Lee, Jeff Ens, Sara Adkins, Pedro Sarmento, Mathieu Barthet, Philippe Pasquier) As part of the MIREX public evaluations: CMAI PhD student Yinghao Ma is task captain for the Music Reasoning QA, Audio Beat Tracking, and Audio Key Detection tasks CMAI PhD student Huan Zhang is task captain for RenCon 2025: Expressive Performance Rendering Competition Finally, on the organisational side: CMAI PhD student Chin-Yun Yu is Virtual Co-Chair for the ISMIR 2025 conference. CMAI PhD student Yinghao Ma is co-organising the satellite workshop LLM4MA: Large Language Models for Music & Audio See you at Daejeon!

CMAI organises AES AIMLA 2025 conference

Tue, 02 Sep 2025 23:00:00 +0100

The AES International Conference on Artificial Intelligence and Machine Learning for Audio (AIMLA 2025) will be hosted by the Centre for Multimodal AI of Queen Mary University of London and is taking place on Sept. 8-10, 2025. Several CMAI members are involved in the organisation of the conference, including but not limited to: Josh Reiss (General Chair) George Fazekas (Papers Co-chair) Soumya Vanka (Special Sessions Co-Chair) Franco Caspe (Special Sessions Co-Chair) Farida Yusuf (Sponsorship Chair) Emmanouil Benetos (Publicity Chair) Nelly Garcia (Social Events Coordinator) Ilias Ibnyahya (Treasurer) Chin-Yun Yu (Late Breaking Papers Chair) Marikaiti Primenta (Invited Speakers Chair) Several papers and presentations will be made from CMAI members at AIMLA as well. The following peer-reviewed papers will be presented at the conference: NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects, by Marco Comunità, Christian Steinmetz, Joshua Reiss Transfer Learning for Neural Modelling of Nonlinear Distortion Effects, by Tara Vanhatalo, Pierrick Legrand, Myriam Desainte-Catherine, Pierre Hanna, Guillaume Pille, Antoine Brusco, Joshua Reiss Sound Matching an Analogue Levelling Amplifier Using the Newton-Raphson Method, by Chin-Yun Yu, George Fazekas Procedural Music Generation Systems in Games, by Shangxuan Luo, Joshua Reiss Neutone SDK: An Open Source Framework for Neural Audio Processing, by Christopher Mitcheltree, Bogdan Teleaga, Andrew Fyfe, Naotake Masuda, Matthias Schäfer, Alfie Bradic, Nao Tokui The following late-breaking posters from CMAI members will be presented at AIMLA: Transformer-Based Sustain Pedal Reconstruction for Expressive Piano Performance MIDI, by Wenhao Liu, George Fazekas, Jingjing Tang Decoding Melodic Acoustic Features from Neural Data, by Zorka Bozilovic, Iran Roman Towards Intelligent Music Education: Score-Informed Transcription and Performance Assessment, by Jack Loth, Marikaiti Primenta, Jingjing Tang, Xavier Riley, Simon Dixon, Emmanouil Benetos Last but not least, the following tutorial will be co-presented by CMAI PhD student Franco Caspe: Real-Time Neural Audio Inference , by Franco Caspe and Jatin Chowdhury See you in London!

CMAI student to join the Alan Turing Institute in 2025-2026

Tue, 29 Jul 2025 23:00:00 +0100

CMAI PhD student Aditya Bhattacharjee has been awarded an enrichment placement by the Alan Turing Institute, the UK's national institute in artificial intelligence and data science, enabling him to join and interact with institute researchers and its community in the 2025/26 academic year. Aditya is supervised by Dr Emmanouil Benetos and will be entering the final year of his PhD study. Aditya's placement will be hosted by the Turing's Fundamental research in data science and AI research programme. Congratulations to Aditya!

CMAI at IJCNN 2025 conference

Fri, 13 Jun 2025 23:00:00 +0100

On 30 June - 5 July 2025, CMAI researchers will participate at the IEEE International Joint Conference on Neural Networks (IJCNN 2025), the flagship conference of the IEEE Computational Intelligence Society and the International Neural Network Society. The Centre for Multimodal AI will have a strong presence at the conference. The following papers authored/co-authored by CMAI members will be presented at IJCNN 2025: VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning, by Alexandros Xenos, Niki Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos ImprovNet - Generating Controllable Musical Improvisations with Iterative Corruption Refinement, by Keshav Bhandari, Sungkyun Chang, Tongyu Lu, Fareza Rahman Enus, Louis Bradshaw, Dorien Herremans, Simon Colton Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation, by Jincheng Zhang, George Fazekas, Charalampos Saitis Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks, by Christos Plachouras, Julien Guinot, George Fazekas, Elio Quinton, Emmanouil Benetos, Johan Pauwels The following presentation from CMAI members will also be made at IJCNN 2025: Split Fine-Tuning of BERT-based Music Models in the Edge-Cloud Continuum: An Empirical Analysis, by Bradley Aldous, Wai Fong Tam, Ahmed M. A. Sayed See you in Rome!

CMAI best paper award at EvoMUSART 2025

Sun, 11 May 2025 23:00:00 +0100

The 14th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART), part of Evostar, took place in Trieste, Italy, between 23 and 25 April 2025. We are pleased to announce that the following paper, authored by CMAI PhD student Keshav Bhandari, received the best paper award! Yin-Yang: Developing Motifs With Long-Term Structure And Controllability, Keshav Bhandari, Geraint A. Wiggins, Simon Colton Yin-Yang is a neuro-symbolic framework that combines three transformer models to generate structured melodies with coherent long-term development, while allowing user control over musical themes and variations.