Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios

Pipoli, Vittorio; Bolelli, Federico; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita; Ficarra, Elisa

doi:10.1109/WACV61041.2025.00486

This paper tackles the domain of multimodal prompting for visual recognition, specifically when dealing with missing modalities through multimodal Transformers. It presents two main contributions: (i) we introduce a novel prompt learning module which is designed to produce sample-specific prompts and (ii) we show that modality-agnostic prompts can effectively adjust to diverse missing modality scenarios. Our model, termed SCP, exploits the semantic representation of available modalities to query a learnable memory bank, which allows the generation of prompts based on the semantics of the input. Notably, SCP distinguishes itself from existing methodologies for its capacity of self-adjusting to both the missing modality scenario and the semantic context of the input, without prior knowledge about the specific missing modality and the number of modalities. Through extensive experiments, we show the effectiveness of the proposed prompt learning framework and demonstrate enhanced performance and robustness across a spectrum of missing modality cases.

Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios / Pipoli, V., Bolelli, F., Sarto, S., Cornia, M., Baraldi, L., Grana, C., Cucchiara, R., Ficarra, E.. - (2025), pp. 4968-4977. (2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 Tucson, Arizona Feb 28 - Mar 4 2025) [10.1109/WACV61041.2025.00486].

Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios

Pipoli, Vittorio;Bolelli, Federico;Sarto, Sara;Cornia, Marcella;Baraldi, Lorenzo;Grana, Costantino;Cucchiara, Rita;Ficarra, Elisa

2025

Abstract

This paper tackles the domain of multimodal prompting for visual recognition, specifically when dealing with missing modalities through multimodal Transformers. It presents two main contributions: (i) we introduce a novel prompt learning module which is designed to produce sample-specific prompts and (ii) we show that modality-agnostic prompts can effectively adjust to diverse missing modality scenarios. Our model, termed SCP, exploits the semantic representation of available modalities to query a learnable memory bank, which allows the generation of prompts based on the semantics of the input. Notably, SCP distinguishes itself from existing methodologies for its capacity of self-adjusting to both the missing modality scenario and the semantic context of the input, without prior knowledge about the specific missing modality and the number of modalities. Through extensive experiments, we show the effectiveness of the proposed prompt learning framework and demonstrate enhanced performance and robustness across a spectrum of missing modality cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del Convegno
	
				2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
			
	Luogo del Convegno
	
				Tucson, Arizona
			
	Data del Convegno
	
				Feb 28 - Mar 4 2025
			
	Codice DOI
	
				https://dx.doi.org/10.1109/WACV61041.2025.00486
			
	Codice WoS
	
				WOS:001481328900476
			
	Codice Scopus
	
				2-s2.0-105003639530
			
	Serie
	
				IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION
			
	Pagina iniziale
	
				4968
			
	Pagina finale
	
				4977
			
	Tutti gli autori
	
						Pipoli, Vittorio; Bolelli, Federico; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita; Ficarra, Elisa
					
	Citazione
	
				Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios / Pipoli, V., Bolelli, F., Sarto, S., Cornia, M., Baraldi, L., Grana, C., Cucchiara, R., Ficarra, E.. - (2025), pp. 4968-4977. (2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 Tucson, Arizona Feb 28 - Mar 4 2025) [10.1109/WACV61041.2025.00486].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
main.pdf Accesso riservato Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Licenza: [IR] closed Dimensione 8.73 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	8.73 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Semantically_Conditioned_Prompts_for_Visual_Recognition_Under_Missing_Modality_Scenarios.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Licenza: [IR] closed Dimensione 8.78 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	8.78 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris