Events
Dave Dai: Cheminformatics and Machine Learning Approaches for GPCR Computer-Aided Drug Design
Centre for Experimental Physics and Quantum Technology Centre for Chemical ResearchDate: 9 December 2025 Time: 15:00 - 16:00
Location: GO Jones 516
Dave Dai from Arianna Fornili's research group will give a talk entitled: Cheminformatics and Machine Learning Approaches for GPCR Computer-Aided Drug Design. Abstract below.
Cheminformatics and Machine Learning Approaches for GPCR Computer-Aided Drug Design
Computer-aided drug design (CADD) plays a central role in accelerating modern drug discovery, particularly for complex targets such as G protein–coupled receptors (GPCRs). Within CADD, structure-based drug design (SBDD) and ligand-based drug design (LBDD) offer complementary strategies: SBDD leverages receptor structures to guide de novo design, whereas LBDD exploits known ligands and their bioactivity relationships to propose new analogues. This PhD project aims to develop and integrate machine-learning tools that advance both SBDD- and LBDD-based approaches for GPCR-focused drug discovery.
On the LBDD side, we develop ANNalog, a transformer-based sequence-to-sequence model trained on pairs of molecules extracted from the same bioactivity assays in ChEMBL. Molecules are encoded as SMILES, and Levenshtein distance–guided alignment is used to increase intrapair string similarity, substantially improving generative performance. ANNalog can propose close analogues via subtle substituent changes as well as perform scaffold hopping, yielding structurally distinct yet biologically plausible candidates. Its scaffold-hopping capability is demonstrated on manually curated analogue sets and case studies involving orexin-2 receptor antagonists from patent literature, with a substantial fraction of known scaffolds being rediscovered under user-guided (prefix-controlled) generation.
On the SBDD side, we introduce MultiStructure, a generative AI framework that incorporates multiple receptor conformations into a reinforcement learning–driven recurrent neural network for de novo molecule design. Molecules are generated as SMILES strings and optimized using a customised rank-by-vote objective that combines docking scores across several receptor structures. Applied to the dopamine D₂ receptor as a model GPCR, MultiStructure-generated molecules exhibit improved docking scores and higher interaction similarity to co-crystallised ligands compared with single-structure baselines, indicating that explicit multi-receptor integration can enhance the quality and robustness of proposed chemotypes.
Updated by: James Thomas