Research
Computer Vision, Deep Learning
Interests
My research interests are mainly in the problems of image & video recognition, detection and tracking, pose estimation, image & video generation, 3D reconstruction and super-resolution, with humans and their actions being the focal point of my research. I have approached these problems mainly using tools from Mathematical Optimization and Machine Learning. My current focus is on Compute & Data Efficient Deep Learning and its application to video recognition.
Publications

Publications of specific relevance to the Centre for Multimodal AI
2025
Compress & Cache: Vision token compression for
efficient generation and retrievalBulat A Ouali Y Tzimiropoulos G
NeurIPS 2025. The Thirty-Ninth Annual Conference on Neural Information Processing Systems..
29-10-2025
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable DiffusionYang H Bulat A Hadji I Pham HX Zhu X Tzimiropoulos G Martinez B
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 2459-2468.
17-06-2025
VladVA: Discriminative Fine-tuning of LVLMsOuali Y Bulat A Xenos A Zaganidis A Metaxas IM Martinez B Tzimiropoulos G
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 4101-4111.
17-06-2025
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional ConditioningHadji I Noroozi M Escorcia V Zaganidis A Martinez B Tzimiropoulos G
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 12789-12798.
17-06-2025
Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene DescriptionsNtinou I Xenos A Ouali Y Bulat A Tzimiropoulos G
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing., 14057-14073.
01-01-20252024
Memsvd: Long-Range Temporal Structure Capturing Using Incremental SVDNtinou I Sanchez E
2024 IEEE International Conference on Image Processing (ICIP). vol. 00, 458-464.
30-10-2024
MobileQuant: Mobile-friendly Quantization for On-device Language ModelsTan F Lee R Dudziak Ł Hu SX Bhattacharya S Hospedales T Tzimiropoulos G Martinez B
In
Arxiv 04-10-2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster BalancingManiadis Metaxas I Tzimiropoulos G Patras I
European Conference on Computer Vision 2024 29 Sep 2024 - 4 Oct 2024.
29-09-2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster BalancingMetaxas IM Tzimiropoulos G Patras I
In
Arxiv 15-07-2024
QBB: Quantization with Binary Bases for LLMsBulat A Ouali Y Tzimiropoulos G
Advances in Neural Information Processing Systems 37., 3209-3228.
01-01-2024
CemiFace: Center-based Semi-hard Synthetic Face Generation for Face RecognitionPatras I Song S Sun Z Tzimiropoulos G
Advances in Neural Information Processing Systems 37., 35612-35638.
01-01-2024
Efficient Vision-Language pre-training via domain-specific learning for human activitiesBulat A Ouali Y Guerrero R Martinez B
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing., 7978-8000.
01-01-20242023
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L ModelsBulat A Tzimiropoulos G
International Journal of Computer Vision,
Springer Nature vol. 132 (4), 1108-1125.
25-10-2023
Black Box Few-Shot Adaptation for Vision-Language modelsOuali Y Bulat A Matinez B
2023 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 15488-15500.
06-10-2023
FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-trainingBulat A Guerrero R Tzimiropoulos G
2023 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 11759-11768.
06-10-2023
ReGen: A good Generative zero-shot video classifier should be RewardedBulat A Martinez B
2023 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 13477-13487.
06-10-2023
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget FacesBounareli S Tzelepis C Argyriou V Patras I Tzimiropoulos G
2023 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 7115-7125.
06-10-2023
Bayesian Prompt Learning for Image-Language Model GeneralizationDerakhshani MM Sanchez E Bulat A Da Costa VGT Martinez B
2023 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 15191-15200.
06-10-2023
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language ModelsBulat A
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 23232-23241.
24-06-2023
From Keypoints to Object Landmarks via Self-Training Correspondence: A Novel Approach to Unsupervised Landmark DiscoveryMallis D Sanchez E Bell M
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Institute of Electrical and Electronics Engineers (IEEE) vol. 45 (7), 8390-8404.
05-06-20232022
Part-based Face Recognition with Vision
TransformersSun Z
British Machine Vision Conference.
21-11-2022
Finding Directions in GAN’s Latent Space for
Neural Face ReenactmentBounareli S Tzimiropoulos G
British Machine Vision Conference.
21-11-2022
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision TransformersPan J Bulat A Tan F Zhu X Li H
Lecture Notes in Computer Science. vol. 13671, 294-311.
01-01-2022
Pre-training Strategies and Datasets for Facial Representation LearningBulat A Cheng S Yang J Garbett A
Lecture Notes in Computer Science. vol. 13673, 107-125.
01-01-20222021
Space-time Mixing Attention for Video TransformerBulat A Perez-Rua J-M Tzimiropoulos G
Thirty-fifth Conference on Neural Information Processing Systems.
06-12-2021
WarpedGANSpace: Finding non-linear RBF paths in GAN latent spaceTzelepis C Tzimiropoulos G Patras I
2021 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 6373-6382.
17-10-2021
Bit-Mixer: Mixed-precision networks with runtime bit-width selectionBulat A
2021 IEEE/CVF International Conference on Computer Vision (ICCV). vol. 00, 5168-5177.
17-10-2021
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognitionSanchez E Valstar M Tzimiropoulos G
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 9070-9080.
25-06-2021
Knowledge distillation via softmax regression representation learningYang J Martinez B Bulat A
International Conference on Learning Representations (ICLR).
04-05-2021
High-Capacity Expert Binary NetworksBulat A Tzimiropoulos G
International Conference on Learning Representations (ICLR).
04-05-2021
Self-supervised Learning of Person-specific Facial Dynamics for Automatic Personality RecognitionSong S Sanchez E Tzimiropoulos G Shen L Valstar M
IEEE Transactions on Affective Computing,
Institute of Electrical and Electronics Engineers 09-03-2021
A Transfer Learning approach to Heatmap Regression for Action Unit intensity estimationNtinou IN Sanchez E Bulat A Tzimiropoulos G
IEEE Transactions on Affective Computing 23-02-20212020
Unsupervised Learning of Object Landmarks via Self-Training CorrespondenceDimitrios M Enrique S
Advances in Neural Information Processing Systems (NeurIPS) 6 Dec 2020 - 12 Dec 2020.
06-12-2020
AnimaWeb: A Large-Scale Hierarchical Dataset of Annotated Animal FacesKhan MH Khan S Shahabuddin M Khan FS
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 6937-6946.
13-06-2020
FAN-Face: a Simple Orthogonal Improvement to Deep Face RecognitionYang J Bulat A
Proceedings of The Aaai Conference on Artificial Intelligence,
Association For The Advancement of Artificial Intelligence (Aaai) vol. 34 (07), 12621-12628.
03-04-2020
BATS: Binary ArchitecTure SearchBulat A Martinez B Tzimiropoulos G
Lecture Notes in Computer Science. vol. 12368, 309-325.
01-01-20202019
T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order TensorKossaifi J Bulat A Tzimiropoulos G Pantic M
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 7814-7823.
20-06-20192018
Hierarchical Binary CNNs for Landmark Localization with Limited ResourcesBulat A Tzimiropoulos G
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Institute of Electrical and Electronics Engineers (IEEE) vol. 42 (2), 343-356.
23-08-20182017
Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN RegressionJackson AS Bulat A Tzimiropoulos G
2017 IEEE International Conference on Computer Vision (ICCV)., 1031-1039.
01-10-2017
How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks)Bulat A Tzimiropoulos G
2017 IEEE International Conference on Computer Vision (ICCV)., 1021-1030.
01-10-2017
A Functional Regression Approach to Facial Landmark TrackingSanchez-Lozano E Tzimiropoulos G Martinez B De la Torre F
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Institute of Electrical and Electronics Engineers (IEEE) vol. 40 (9), 2037-2050.
29-08-20172016
Convolutional aggregation of local evidence for large pose face alignmentBulat A
Procedings of the British Machine Vision Conference 2016., 86.1-86.12.
01-01-2016
Human Pose Estimation via Convolutional Part Heatmap RegressionBulat A
Lecture Notes in Computer Science. vol. 9911, 717-732.
01-01-20162014
Gauss-Newton Deformable Part Models for Face Alignment in-the-WildTzimiropoulos G Pantic M
2014 IEEE Conference on Computer Vision and Pattern Recognition., 1851-1858.
01-06-2014
A Simple Baseline for Knowledge-Based Visual Question AnsweringXenos A Stafylakis T Patras I Tzimiropoulos G
Empirical Methods in Natural Language Processing 6 Dec 2023 - 10 Dec 2023.