With the emergence of Transformers and Vision-Language Models (VLMs) such as CLIP, fine-tuning large pre-trained models has recently become a prevalent strategy in Continual Learning. This has led to the development of numerous prompting strategies to adapt transformer-based models without incurring catastrophic forgetting. However, these strategies often compromise the original zero-shot capabilities of the pre-trained CLIP model and struggle to adapt to domains that significantly deviate from the pre-training data. In this work, we propose Continual Generative training for Incremental prompt-Learning, a simple and novel approach to mitigate forgetting while adapting CLIP. Briefly, we employ Variational Autoencoders (VAEs) to learn class-conditioned distributions within the embedding space of the visual encoder. We then exploit these distributions to sample new synthetic visual embeddings and train the corresponding class-specific textual prompts during subsequent tasks. Through extensive experiments on different domains, we show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities, evaluated using a novel metric tailored for CL scenarios. Notably, further analysis reveals that our approach can bridge the gap with joint prompt tuning. The codebase is available at https://github.com/aimagelab/mammoth.

CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning / Frascaroli, Emanuele; Panariello, Aniello; Buzzega, Pietro; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone. - (2024). (Intervento presentato al convegno British Machine Vision Conference tenutosi a Glasgow, UK nel 25th - 28th November 2024).

CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning

Emanuele Frascaroli
;
Aniello Panariello
;
Pietro Buzzega;Lorenzo Bonicelli;Angelo Porrello;Simone Calderara
2024

Abstract

With the emergence of Transformers and Vision-Language Models (VLMs) such as CLIP, fine-tuning large pre-trained models has recently become a prevalent strategy in Continual Learning. This has led to the development of numerous prompting strategies to adapt transformer-based models without incurring catastrophic forgetting. However, these strategies often compromise the original zero-shot capabilities of the pre-trained CLIP model and struggle to adapt to domains that significantly deviate from the pre-training data. In this work, we propose Continual Generative training for Incremental prompt-Learning, a simple and novel approach to mitigate forgetting while adapting CLIP. Briefly, we employ Variational Autoencoders (VAEs) to learn class-conditioned distributions within the embedding space of the visual encoder. We then exploit these distributions to sample new synthetic visual embeddings and train the corresponding class-specific textual prompts during subsequent tasks. Through extensive experiments on different domains, we show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities, evaluated using a novel metric tailored for CL scenarios. Notably, further analysis reveals that our approach can bridge the gap with joint prompt tuning. The codebase is available at https://github.com/aimagelab/mammoth.
2024
British Machine Vision Conference
Glasgow, UK
25th - 28th November 2024
Frascaroli, Emanuele; Panariello, Aniello; Buzzega, Pietro; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning / Frascaroli, Emanuele; Panariello, Aniello; Buzzega, Pietro; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone. - (2024). (Intervento presentato al convegno British Machine Vision Conference tenutosi a Glasgow, UK nel 25th - 28th November 2024).
File in questo prodotto:
File Dimensione Formato  
BMVC2024_CGIL.pdf

Open access

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 359.78 kB
Formato Adobe PDF
359.78 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1353589
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact