Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to the temporal complexity and subtlety of emotional expressions. In this paper, we propose a novel pipeline that leverages facial Action Units (AUs) as structured time series descriptors of facial muscle activity, enabling emotion classification in videos through a Multiple Instance Learning (MIL) framework. Our approach models each video as a bag of AU-based instances, capturing localized temporal patterns, and allows for robust learning even when only coarse video-level emotion labels are available. Crucially, the approach incorporates interpretability mechanisms that highlight the temporal segments most influential to the final prediction, providing informed decision-making and facilitating downstream analysis. Experimental results on benchmark FER video datasets demonstrate that our method achieves competitive performance using only visual data, without requiring multimodal signals or frame-level supervision. This highlights its potential as an interpretable and efficient solution for weakly supervised emotion recognition in real-world scenarios.

Decoding Facial Expressions in Video: A Multiple Instance Learning Perspective on Action Units / Del Gaudio, Livia; Cuculo, Vittorio; Cucchiara, Rita. - (2025). (Intervento presentato al convegno International Conference on Image Analysis and Processing, ICIAP Workshops, 2025 tenutosi a Rome, Italy nel 20/09/2025).

Decoding Facial Expressions in Video: A Multiple Instance Learning Perspective on Action Units

Livia Del Gaudio
;
Vittorio Cuculo;Rita Cucchiara
2025

Abstract

Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to the temporal complexity and subtlety of emotional expressions. In this paper, we propose a novel pipeline that leverages facial Action Units (AUs) as structured time series descriptors of facial muscle activity, enabling emotion classification in videos through a Multiple Instance Learning (MIL) framework. Our approach models each video as a bag of AU-based instances, capturing localized temporal patterns, and allows for robust learning even when only coarse video-level emotion labels are available. Crucially, the approach incorporates interpretability mechanisms that highlight the temporal segments most influential to the final prediction, providing informed decision-making and facilitating downstream analysis. Experimental results on benchmark FER video datasets demonstrate that our method achieves competitive performance using only visual data, without requiring multimodal signals or frame-level supervision. This highlights its potential as an interpretable and efficient solution for weakly supervised emotion recognition in real-world scenarios.
2025
International Conference on Image Analysis and Processing, ICIAP Workshops, 2025
Rome, Italy
20/09/2025
Del Gaudio, Livia; Cuculo, Vittorio; Cucchiara, Rita
Decoding Facial Expressions in Video: A Multiple Instance Learning Perspective on Action Units / Del Gaudio, Livia; Cuculo, Vittorio; Cucchiara, Rita. - (2025). (Intervento presentato al convegno International Conference on Image Analysis and Processing, ICIAP Workshops, 2025 tenutosi a Rome, Italy nel 20/09/2025).
File in questo prodotto:
File Dimensione Formato  
ID-6-DelGaudio-Livia.pdf

Open access

Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 858.49 kB
Formato Adobe PDF
858.49 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1383348
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact