Events

How to Choose the Best Model from a Large Language Model Zoo?

Centre for Networks, Communications and Systems

Date: 26 November 2025 Time: 15:00 - 16:00

Location: Mile end (room to be defined closer to the date)

Registration Link: https://forms.cloud.microsoft/e/qyVTHuH9X6 Please express your interest by 18/11/2025.

Title: How to Choose the Best Model from a Large Language Model Zoo?

Speaker: Prof. Shiqiang Wang

Abstract: Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. In this talk, I will share some of our recent progress towards choosing the best model in the presence of such competing interests. I will first introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous service level agreement (SLA) compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Then, considering more sophisticated systems with lightweight local LLMs for processing simple tasks at high speed and large-scale cloud LLMs for handling multi-modal data sources, I will present TMO, a local-cloud LLM inference system with "Three-M" Offloading: Multi-modal, Multi-task, and Multi-dialogue. TMO leverages a strategy based on reinforcement learning (RL) to optimize the inference location and multi-modal data sources to use for each task/dialogue, aiming to maximize the long-term reward while adhering to resource constraints. Finally, I will conclude the talk by outlining some future directions.

Speaker short-bio: Shiqiang Wang is a Professor of Artificial Intelligence in the Department of Computer Science, University of Exeter, United Kingdom. He was a researcher at IBM T. J. Watson Research Center, NY, United States until Oct. 2025. He received his Ph.D. from Imperial College London, United Kingdom, in 2015. His research focuses on the intersection of artificial intelligence (AI), distributed computing, and optimization, with a broad range of applications including large language models (LLMs), agentic AI, efficient model training and inference, and AI in distributed systems. He has made foundational contributions to edge computing and federated learning that generated both academic and industrial impact. Dr. Wang serves as an associate editor of the IEEE Transactions on Mobile Computing and IEEE Transactions on Parallel and Distributed Systems. He served as an area chair of major AI and machine learning conferences, including AAAI, ICLR, ICML, and NeurIPS. He received the IEEE Communications Society (ComSoc) Leonard G. Abraham Prize in 2021, IEEE ComSoc Best Young Professional Award in Industry in 2021, IBM Outstanding Technical Achievement Awards (OTAA) in 2019, 2021, 2022, and 2023, multiple Invention Achievement Awards from IBM since 2016, Best Paper Runner-Up of ACM MobiHoc 2025, and Best Student Paper Award of the Network and Information Sciences International Technology Alliance (NIS-ITA) in 2015.

Updated by: Antonino Masaracchia