Research
Computer Vision, Deep Learning
Interests
My research interests are mainly in the problems of image & video recognition, detection and tracking, pose estimation, image & video generation, 3D reconstruction and super-resolution, with humans and their actions being the focal point of my research. I have approached these problems mainly using tools from Mathematical Optimization and Machine Learning. My current focus is on Compute & Data Efficient Deep Learning and its application to video recognition.
Publications

Publications of specific relevance to the Centre for Multimodal AI
2025
Compress & Cache: Vision token compression for
efficient generation and retrievalBulat A Ouali Y Tzimiropoulos G
NeurIPS 2025. The Thirty-Ninth Annual Conference on Neural Information Processing Systems..
29-10-20252024
Memsvd: Long-Range Temporal Structure Capturing Using Incremental SVDNtinou I Sanchez E
2024 IEEE International Conference on Image Processing (ICIP). vol. 00, 458-464.
30-10-2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster BalancingManiadis Metaxas I Tzimiropoulos G Patras I
European Conference on Computer Vision 2024 29 Sep 2024 - 4 Oct 2024.
29-09-2024
MobileQuant: Mobile-friendly Quantization for On-device Language ModelsTan F Lee R Dudziak Ł Hu SX Bhattacharya S Hospedales T Tzimiropoulos G Martinez B
In
Arxiv 25-08-2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster BalancingMetaxas IM Tzimiropoulos G Patras I
In
Arxiv 15-07-20242023
Language-Aware Soft Prompting: Text-to-Text Optimization for Fewand
Zero-Shot Adaptation of V&L ModelsBulat A
International Journal of Computer Vision,
Springer 26-10-2023
Black Box Few-Shot Adaptation for Vision-Language modelsOuali Y Bulat A Martinez B
International Conference on Computer Vision.
02-10-2023
Fs-detr: Few-shot detection transformer with prompting and without re-trainingBulat A Guerrero R Martinez B
International Conference on Computer Vision.
02-10-2023
ReGen: A good Generative zero-shot video classifier should be RewardedBulat A Sanchez E Martinez B
International Conference on Computer Vision.
02-10-2023
HyperReenact: one-shot reenactment via jointly learning to refine and retarget facesBounareli S TZELEPIS C Argyriou V Patras I Tzimiropoulos G
International Conference on Computer Vision.
02-10-2023
Bayesian Prompt Learning for Image-Language Model GeneralizationDerakhshani MM Sanchez E Bulat A Turrisi da Costa VG Martinez B
International Conference on Computer Vision.
02-10-2023
From Keypoints to Object Landmarks via Self-Training Correspondence: A novel approach to Unsupervised Landmark DiscoveryMallis D Sanchez E Bell M
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Institute of Electrical and Electronics Engineers 04-01-20232022
Part-based Face Recognition with Vision
TransformersSun Z
British Machine Vision Conference.
21-11-2022
Finding Directions in GAN’s Latent Space for
Neural Face ReenactmentBounareli S Tzimiropoulos G
British Machine Vision Conference.
21-11-2022
EdgeViTs: Competing Light-weight CNNs onMobile Devices with Vision TransformersPan J Bulat A Tan F Zhu X Dudziak L Li H Tzimiropoulos G
European Conference on Computer Vision.
25-10-2022
Pre-training strategies and datasets for facial representation learningBulat A Cheng S Yang J Sanchez E
European Confence on Computer Vision.
25-10-20222021
Space-time Mixing Attention for Video TransformerBulat A Perez-Rua J-M Tzimiropoulos G
Thirty-fifth Conference on Neural Information Processing Systems.
06-12-2021
Bit-Mixer: Mixed-precision networks with runtime bit-width selectionBulat A Tzimiropoulos G
International Conference on Computer Vision (ICCV) 11 Oct 2021 - 17 Oct 2021.
11-10-2021
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognitionSanchez E Tellamekala MK Tzimiropoulos G
IEEE/CVF Conference on Computer Vision and Pattern Recognition 19 Jun 2021 - 25 Jun 2021.
21-06-2021
Knowledge distillation via softmax regression representation learningYang J Martinez B Bulat A
International Conference on Learning Representations (ICLR).
04-05-2021
High-Capacity Expert Binary NetworksBulat A Tzimiropoulos G
International Conference on Learning Representations (ICLR).
04-05-2021
Self-supervised Learning of Person-specific Facial Dynamics for Automatic Personality RecognitionSong S Sanchez E Tzimiropoulos G Shen L Valstar M
IEEE Transactions on Affective Computing,
Institute of Electrical and Electronics Engineers 09-03-2021
A Transfer Learning approach to Heatmap Regression for Action Unit intensity estimationNtinou IN Sanchez E Bulat A Tzimiropoulos G
IEEE Transactions on Affective Computing 23-02-20212020
Unsupervised Learning of Object Landmarks via Self-Training CorrespondenceDimitrios M Enrique S
Advances in Neural Information Processing Systems (NeurIPS) 6 Dec 2020 - 12 Dec 2020.
06-12-2020
BATS: Binary ArchitecTure SearchBulat A Martinez B
European Conference on Computer Vision (ECCV) 23 Aug 2020 - 28 Aug 2020.
24-08-2020
FAN-Face: a Simple Orthogonal Improvement to Deep Face RecognitionYang J Bulat A
Proceedings of The Aaai Conference on Artificial Intelligence,
Association For The Advancement of Artificial Intelligence (Aaai) vol. 34 (07), 12621-12628.
03-04-20202019
T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order TensorKossaifi J Bulat A Tzimiropoulos G Pantic M
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. 00, 7814-7823.
20-06-20192018
Hierarchical Binary CNNs for Landmark Localization with Limited ResourcesBulat A Tzimiropoulos G
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Institute of Electrical and Electronics Engineers vol. 42 (2), 343-356.
23-08-20182017
Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN RegressionJackson AS Argyriou V Tzimiropoulos G
IEEE International Conference on Computer Vision. vol. 2017-October, 1031-1039.
22-12-2017
How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks)Bulat A Tzimiropoulos G
2017 IEEE International Conference on Computer Vision (ICCV)., 1021-1030.
01-10-2017
A Functional Regression Approach to Facial Landmark TrackingSanchez-Lozano E Tzimiropoulos G Martinez B Torre FDL
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Institute of Electrical and Electronics Engineers vol. 40 (9), 2037-2050.
29-08-20172016
Human pose estimation via convolutional part heatmap regressionBulat A
https://link.springer.com/conference/eccv. vol. 9911 LNCS, 717-732.
16-09-2016
Convolutional aggregation of local evidence for large pose face alignmentBulat A
Procedings of the British Machine Vision Conference 2016., 86.1-86.12.
01-01-20162014
Gauss-Newton deformable part models for face alignment in-the-wildTzimiropoulos G Pantic M
IEEE Computer Society Conference on Computer Vision and Pattern Recognition., 1851-1858.
25-09-2014
QBB: Quantization with Binary Bases for LLMsBulat A Ouali Y Tzimiropoulos G
Neural Information Processing Systems.
CemiFace: Center-based Semi-hard Synthetic Face
Generation for Face RecognitionSun Z Song S Patras I Tzimiropoulos G
Neural Information Processing Systems (NeurIPS 2024)..
Efficient Vision-Language pre-training via domain-specific learning for human activitiesBulat A Ouali Y Guerrero R Martinez B Tzimiropoulos G
Empirical Methods in Natural Language Processing.
FAM Diffusion: Frequency and Attention Modulation for High-Resolution
Image Generation with Stable DiffusionYang H Bulat A Hadji I Pham HX Zhu X Tzimiropoulos G Martinez B
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025.
VLADVA: Discriminative Fine-tuning of LVLMsOuali Y Bulat A Xenos A Maniadis Metaxas I Martinez B Tzimiropoulos G
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025.
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution
with Stable Diffusion via Bidirectional ConditioningHadji I Noroozi M Escorzia V Zaganidis A Martinez B Tzimiropoulos G
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025.
Vision-Free Retrieval: Rethinking Multimodal Search with Textual SceneNtinou I Xenos A Ouali Y Bulat A Tzimiropoulos G
2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025).
AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal FacesKhan MH McDonagh J Khan S Shahabuddin M Arora A Khan FS Shao L Tzimiropoulos G
2020 Conference on Computer Vision and Pattern Recognition.
WarpedGANSpace: Finding non-linear RBF paths in GAN latent spaceTzelepis C Tzimiropoulos G Patras I
International Conference on Computer Vision 11 Oct 2021 - 17 Oct 2021.
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language ModelsBulat A Tzimiropoulos G
IEEE/CVF Conference on Computer Vision and Pattern Recognition.
A Simple Baseline for Knowledge-Based Visual Question AnsweringXenos A Stafylakis T Patras I Tzimiropoulos G
Empirical Methods in Natural Language Processing 6 Dec 2023 - 10 Dec 2023.