Events

The Leaderboard Illusion

Centre for Human-Centred Computing Centre for Multimodal AI

Date: 22 May 2025 Time: 16:00 - 17:00

Location: Ada Lovelace meeting room (Alan Turing Institute) & online

This Thursday on 22nd May at 4-5pm (GMT), we will have an event hosted online and at the Alan Turing Institute with Shivalika Singh (Research Scientist) and Yiyang Nan (Research Scholar) from the Cohere Open Science Research Team, who will be presenting their talk "The Leaderboard Illusion".

Date: Thursday, 22nd May
Time: 4-5pm (GMT)
Location: Ada Lovelace meeting room (Alan Turing Institute) & online
Zoom: turing-uk.zoom.us/j/6687887536

Title: The Leaderboard Illusion
Paper: arxiv.org/abs/2504.20879
Presenters: Shivalika Singh & Yiyang Nan

Abstract:
The Chatbot Arena, a widely used benchmark for ranking large language models, has significant systemic issues that distort fair evaluation. The study reveals that private testing and selective disclosure practices, such as Meta testing 27 LLM variants before releasing LLaMA-4, give major providers like OpenAI and Google an unfair advantage. These companies receive disproportionately high exposure on the platform, with around 20% of total data each, while 83 open-weight models collectively receive less than 30%. This data access imbalance enables performance gains of up to 112% through overfitting to Arena-specific dynamics, rather than reflecting true model quality. The authors call for reforms to ensure transparency and fairness in benchmarking practices.

Contact:

Prof. Maria Liakata

Updated by: Emmanouil Benetos