ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision

The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches exploit prior knowledge to avoid unintended solutions; we acknowledge that, in real-world settings, it could be unfeasible to gather enough prior information to characterize the bias, or it could even raise ethical considerations. We hence propose a novel debiasing approach, termed ClusterFix, which does not require any external hint about the nature of biases. Such an approach alters the standard empirical risk minimization and introduces a per-example weight, encoding how critical and far from the majority an example is. Notably, the weights consider how difficult it is for the model to infer the correct pseudo-label, which is obtained in a self-supervised manner by dividing examples into multiple clusters. Extensive experiments show that the misclassification error incurred in identifying the correct cluster allows for identifying examples prone to bias-related issues. As a result, our approach outperforms existing methods on standard benchmarks for bias removal and fairness.


Introduction
Artificial intelligence systems may suffer inductive bias from data (data bias), but even algorithmic design choices can expedite wrong decisions during training (algorithm bias) [17,20,30].Therefore, enhancing their trustworthiness requires considering more than just performance [3,18,19,25,45].More in general, learning systems exhibit the Principle of Least Effort [17]: small input changes or different contexts can cause significant deviations in object recognition [2,35,43]; minorities or underrepresented groups can suffer high error rates [4,12,40].
To address these challenges, there has been widespread interest in debiasing methods that aim to mitigate unintended solutions.Debiasing interventions can occur before the learning procedure (pre-processing), during model training (in-processing), or after training (post-processing) [24].In particular, in-processing approaches act directly on the algorithm design and effectively mitigate biases.Proposed methods directly debias the model adjusting sample importance [23,36,42] and using adversarial learning [7,26].Other techniques employ quantitative fairness metrics as regularization [6] and optimization constraints [21].Although these works address the problem, they require prior knowledge of protected groups, achieved by grouping samples according to their target and bias attribute values, such as gender.Accessing such information is often infeasible due to privacy and ethical constraints.Furthermore, identifying and quantifying bias attributes a priori can pose challenges in complex systems (e.g., organism, climate, and cognition).

So, what if the protected groups are missing during the learning phase?
Our goal is to train a model to mitigate performance disparities among protected groups without exploiting such information during training.We aim to achieve this objective while ensuring satisfactory average performance.These qualities are particularly essential for practical decision-making algorithms.As an example, in healthcare and forensic genetics inaccurate outcomes can pose significant risks to underrepresented communities, claiming for fair and robust solutions [9,28,32,33].An approximation is needed whenever the system does not access protected groups directly.
Our hypothesis posits that the classification of cluster assignments can leverage shared features between bias-aligned and unbiased samples, presenting an opportunity for an effective mitigation of spurious correlations.Similarly to the works of George [41] and BPA [39], our objective is to address this issue estimating protected groups by clustering.
Existing approaches often encounter a common challenge in the presence of dataset bias: it arises from overfitting noisy and densely populated clusters identified during Em-  pirical Risk Minimization (ERM) pre-training, where the focus is solely on minimizing the training loss for the target variable y [39].Furthermore, these methods rest heavily on the assumption that critical samples can be easily identified through cluster assignments within the ERM feature space.However, real-world scenarios do not guarantee the validity of such an assumption, as critical samples may be dispersed across clusters rather than concentrated in specific ones.
In contrast, our approach takes a different standpoint by acknowledging the dispersed distribution of critical samples within the feature space.Our method involves clustering a self-supervised feature space that operates independently of the label y; in addition to optimizing the target objective, we seek to minimize the cluster classification error.Consequently, we define our weighting mechanism policy by incorporating the auxiliary clustering loss with the target objective.To demonstrate the effectiveness of our approach, we provide a compelling illustration in Fig. 1 with an example.
Identifying Bias.As illustrated in Fig. 1, we consider a dataset partitioned on a target label, such as "Wearing Necklace", where each partition is biased towards a protected attribute like gender.Following the clustering process and assuming these assignments as ground truth, a classifier may encounter challenges within a cluster where the majority of individuals x do not possess the same protected attribute (highlighted in Fig. 1 as critical samples within the grey clusters).To address this challenge, we aim to learn new features that can identify a common per-cluster attribute and achieve satisfactory performance.This means looking for features that are consistent with the original cluster assignment, but different from the protected attribute.We believe that up-weighting samples belonging to clusters with high cluster classification loss can aid in identifying data distributions that enhance model robustness and mitigate bias.
Observation.To validate our intuitions, we conducted an experiment on the CelebA dataset (denoted as D) with the target attribute y representing "Wearing Necklace" and the protected attribute a representing gender (female = 0 and male = 1).In addition, we define a group g as g = (y, a).The target label exhibits a strong correlation with female individuals: within the training set D t , comprising a total of 162, 770 samples, D y=1 consists of 18, 525 females and only 1, 239 males.As aforementioned, we defined critical groups as g 0 : (y = 0, a = 0) and g 1 : (y = 1, a = 1).Consequently, the label z define if x ∈ {g 0 , g 1 } or not.
When training an ERM classifier on y, we observed that the accuracy for the group g 1 (male with necklace) was only 2.72%.Subsequently, based on self-supervised features, we obtained cluster assignments for each partition D y using k-means with k = 8.Afterward, we trained an ERM cluster classifier using these assignments.The inset figure presents the ROC curve for partition D a=1 (blue) and partition D y=1 (orange), which quantifies the correlation between the critical label z and the ERM cluster-objective.Interestingly, we observed a correlation between cluster errors and critical samples in these partitions, indicating the former can be adopted profitably to design an ad-hoc weighting strategy which is the main proposal of our work.In Sec.3.3, we will formally define how our policy detects critical distributions during the learning phase.
ClusterFix.This paper presents ClusterFix (CFix), a practical in-processing debiasing framework aiming to jointly optimize a classification task and an auxiliary one to generalize over groups without their supervision.Our evaluation demonstrates that the proposed solution outperformed the previous state-of-the-art unsupervised debiasing approaches.Moreover, despite not having group annotations during training, ClusterFix outperformed the supervised approach GDRO [36], even without explicit bias information.
Our contributions can be summarized as follow: (1) We propose an effective debiasing method to mitigate inductive bias in real datasets, which does not rely on prior knowledge on protected groups; (2) We show how cluster assignment classification is a practical auxiliary task that improves worst-case robustness and generalization; (3) Our approach reaches state-of-the-art performances in standard bias-removal benchmarks, even w.r.t.supervised methods.

Related Work
Domain Adaptation and Generalization.In the literature, many works focused on achieving generalized models that can handle worst-case scenarios [48,54].In this setting, models run into multiple distributions during the learning phase, designed to generalize well on unseen distributions during testing.In robotics, domain randomization has proven to be an effective technique for dealing with real-world situations [44].Meanwhile, meta-learning aims to acquire the ability to quickly adapt to new conditions by learning how to learn new representations [16,46].Adversarial training can also be a valuable tool for achieving model robustness and analyzing worst-case scenarios [43,52].Adversarial objectives have shown promising generalization abilities and are more aligned with human understanding [37].On the other hand, they cannot provide a theoretical guarantee of reliable performance [25].
Distributionally Robust Optimization.There has been growing interest in fairness and robustness in recent years.Distributionally Robust Optimization (DRO) proposes to guarantee the best outcomes in cases where there is a change in data distribution by controlling the worst-case performance over subsets [14,42].However, since worst-case scenarios may be too pessimistic, others investigate group DRO (GDRO) [36], which optimizes worst-case performance over a known set of groups.However, this algorithm requires sensitive-attribute annotation, which is usually unknown in real datasets; we tackle this issue by estimating partitions through clustering.
Fairness without Demographics.Recently, some researchers have explored more challenging scenarios where they have no access to protected-group labels (so-called unsupervised methods).Our framework aims to tackle this issue by replacing sensitive supervision with clustering assignments.Other methods try to achieve sub-population fairness and worst-case generalization without sensitive prior knowledge by estimating different training data distributions.Our work resembles two cluster-based debiasing strategies, George [41] and BPA [39].Although both methods employ pseudo-labels to partition the dataset into clusters, George has been found vulnerable to outliers overfitting, resulting in suboptimal performance on practical datasets such as CelebA [39].
Deep Clustering and Representation Learning.Our approach leverages clustering to estimate groups and enhance structure-cluster information.In unsupervised settings, clustering serves as a practical training objective for generating improved feature representations.For instance, [5,11] proposed using cluster assignments as the main training objective to mitigate bias stemming from proxy objectives.In [50], Yan et al. enhanced generalization by pre-training a network for clustering in the original feature space and fine-tuning a new model based on cluster assignments.Other approaches have employed deep clustering to jointly learn cluster assignments and representations, preventing feature space collapse [15], or minimizing KL divergence between a centroid-based probability distribution and an auxiliary target distribution [49].Several clustering methods have recently proposed simultaneously learning fair representations and clustering to conceal sensitive attributes, leveraging protected-group knowledge [1,10,26].

Problem Setup
We are given a set of n datapoints x 1 , ..., x n ∈ X associated with a binary label y i ∈ 0, 1 .In addition, each datapoint is associated with a label c i ∈ 1...C defined by cluster assignment.We train a deep neural network composed of a feature extractor F : X → R d , a binary classifier T : R d → R 2 , and a set of auxiliary classifiers C y : R d → R C , one for each task label y.The classification performance of y is evaluated on pre-defined groups G based on the average and worst-group accuracy as in [36].The group label g ∈ G is only used for metric evaluation and is not accessible during optimization.The objective is to achieve good average and worst-group accuracy at test time without training group annotations.

Relevant Objective Functions
In this section, we describe in detail four training approaches to introduce our work in the next section.
Empirical Risk Minimization.Typically, a model defined by a feature extractor F and a target classifier T try to solve the following optimization problem: where represents a loss function, y is the target, and x is the input.In other words, ERM minimizes the average loss across data points.In general, this kind of procedure needs better generalization in some groups during inference.
Group Distributionally Robust Optimization [36].The Group DRO approach utilizes training group annotations to reduce the maximum group error within the training set.Assuming the availability of group annotations for the training data, the objective function for Group DRO can be expressed as: George [41].Since g is often not available in real scenarios, George approximates protected groups with cluster assignments.More specifically, this method is organized as follows: (i) train a model via ERM, (ii) cluster the feature space, (iii) train the final model from scratch with clustering information as in Eq. ( 3).The central empirical hypothesis here is that the feature space of deep neural networks trained via ERM carries information about group labels.
arg min BPA [39].BPA takes advantage of the same hypothesis of George: protected groups are approximated by clustering after ERM pretraining.On the other hand, the optimization process is guided by dynamically reweighting sample importance.
arg min where the sample weight w c i is given by the following equation: 3. Proposed Method: ClusterFix

Overview
ClusterFix (CFix) is a two-stage debiasing approach that does not require protected-group supervision.Similarly to previous works, CFix weighs the contribution of each example to the overall classification loss as in Eq. ( 4); differently, the importance of each example does not depend only on the mismatch between the prediction and ground-truth label y, but also on an additional term that considers how examples cluster in latent space.Briefly, our approach divides into two subsequent steps, as follows.
Cluster Assignment.The first stage regards preparing a pretext task, which will be optimized in the second stage.In particular, the idea is to cluster the latent space of a deep neural network into several groups, thus yielding novel pseudolabels c from the computed assignments to the clusters.In doing so, CFix leverages self-supervised pre-training (carried out through Barlow Twins [51]), whose latent space is afterward clustered through the k-means algorithm.Such a preliminary stage favorably lowers the risk of biased representation during the downstream task.Indeed, the pseudolabels produced as such are opaque w.r.t. the target biasprone task, as they only rely on the self-extracted features.Debiasing Training.In the second stage, we ask the model to pursue a twofold objective: not only it has to learn the target task embodied by y, but there is a tailored objective that constrains the feature space.In particular, it seeks the model to remain consistent with the original cluster assignments c.Eventually, to preserve minority groups, we weight the importance of each example proportionally to the average error of its cluster, where the concept of error considers both original and pseudo labels y and c.In this respect, what differentiates CFix from existing approaches is that the clustering membership has an active and direct contribution to the total learning objective.
As an example, George [41] as well performs a preliminary clustering step; however, while its authors exploited clustering to compute the classification loss w.r.t. an independent sample grouping (represented by c), we instead use a metric related to c to weight examples.
To provide an intuition, we refer to the observation in Sec. 1 in which a clear dependence arises between an independent clusters' assignment metric (e.g., an ERM pretrained classifier) and the presence of protected features.For this purpose, this suggests that the empirical risk paid by a wrong cluster assignment can be exploited as a proxy of the importance that should be given to an element in terms of contribution to the task loss on y.

Step 1: Cluster Assignment
First, the dataset was divided by label, from which features were extracted from a pre-trained model.Next, kmeans was applied to obtain cluster assignments c, which were used as categorical labels for the next stage.Specifically, a self-supervised trained model F : X → R d was used for feature extraction.

Step 2: Debiased Training
Let F : X → R d , T : R d → R 2 , and C y : R d → R C represent the functions of the pre-trained encoder, task classifier, and cluster classifier respectively, Fig. 2.Then, the objective function of the proposed method consists of two parts: weighted-classification loss and cluster-structural loss.In the following, we provide details on these two separate terms and the re-weighting strategy for the proposed model.
Clustering-Structural Loss.Achieving smoothness in the feature space through cluster classification can prevent model shortcuts in y classification and mitigate inductive bias.This method is also valuable for identifying problematic clusters with high average entropy and small size.We hypothesize that there is a correlation between high-entropy clusters and out-of-distribution elements, which can assist the model in generalizing better in worst-case scenarios.This approach's benefits are underpinned by information theory [50], as the structural loss creates an information bottleneck that facilitates the identification of proxy objective bias and improves model generalization.Formally, the clustering-structural loss is defined as follows: Task-Weighted Classification Loss.The main objective of the optimization process is to learn how to classify y, which is achieved through task-weighted classification loss.Each sample's classification loss is weighted by a factor w k , which reflects the significance of the cluster to which the sample belongs.The weight is determined based on the average loss concerning y and the structural loss concerning the clusters.The task-weighted classification loss is defined as follows: where w c is: ClusterFix Objective.In summary, the proposed debiasing method's overall objective function can be expressed as the following optimization problem, the trade-off hyperparameter is denoted by γ:

Experiments
In this section, we present the experiments to evaluate the proposed debiasing method and compare it with several state-of-the-art debiasing techniques.We used the same experimental settings as BPA [39] to ensure a fair comparison among different debiasing methods.Specifically, we trained the proposed method using a ResNet-18 as the backbone architecture with the Adam optimizer, a learning rate of 1 × 10 −4 , a batch size of 256 images, and a weight decay rate of 0.01 for 100 epochs.The learning rate was scheduled with cosine annealing.For all experiments, we performed k-means clustering with K = 8 and the cluster weight of the c th cluster, ω c , was updated with a momentum m = 0.3 as in [39].Additional experiments on hyperparameters and datasets can be found in the supplementary material.

Benchmarks
CelebA.CelebA [27] is a dataset comprising 202, 599 celebrity face images with 40 binary attribute annotations for each image.Moreover, in our experiments, the gender attribute has been used as the bias attribute to evaluate the robustness of the proposed method, as in [39].We initialized the feature extractor F parameters by using a self-supervised pre-trained network with Barlow Twins [51].In particular, for the self-supervised training, we used an output dimension of 1 024, a batch size of 256, and an SGD optimizer with a fixed learning rate of 0.6.We set the λ parameter to 0.5.
Following [39], we focused on gender as the fixed bias attribute and excluded 8 out of 40 attributes due to limited samples in the test set.Among the remaining 32 attributes, 26 exhibited a significant correlation with gender, showing a classification accuracy gap of over 5% compared to unbiased accuracy [36].To explore diverse scenarios, we selected the top 5 attributes with the highest gap and the bottom 5 attributes with the lowest gap, as identified in [39].
Waterbirds.The Waterbirds dataset [36] is designed to evaluate the robustness of deep networks w.r.t.spurious correlations and distribution shifts.It has been created by selecting images of birds from the Caltech-UCSD Birds-200-2011 dataset [47] and overlaying them on backgrounds obtained from the Places dataset [53].This dataset includes two attributes: the type of bird, which can be either a waterbird or landbird, and the background place, which can be either water or land.The training set comprises 4, 791 samples, 56 out of the 1, 113 waterbird samples have a land background,  • Debiased Representations with Pseudo-Attributes (BPA) [39]: a debiased representation is learned by introducing cluster-generated pseudo-attributes; • Group Distributionally Robust Optimization (GDRO) [36]: this method optimizes the worst-case performance over a distributionally robust uncertainty set using explicit bias supervision.
Main Results.The results of our evaluation on CelebA are presented in Tab. 1 and Tab. 2. We show how CFix outperforms all competitors across all evaluated scenarios, achieving higher unbiased and worst-group accuracy metrics.Specifically, our method outperforms the bias-supervised approach Group DRO and state-of-the-art BPA in both metrics.Additionally, our optimization process demonstrates greater stability across different runs.The worst-group accuracy improvement is noteworthy, given that other approaches, such as George and Group DRO, prioritize maximizing worstcase group accuracy, which is not the focus of our approach.
In more detail, our method achieves an average improvement of +0.74% in average accuracy and +3.27% in worst accuracy (+13.63% for Wearing Necklace target) compared to the previous unsupervised state-of-the-art.Additionally, experiments on Waterbirds confirm the effectiveness of our approach even in a controlled environment, as shown in Tab. 3. We observed improvements in the worst-case performance for Object (bird) and Place (backgorund) classification, with gains of +2.64% and +7.45%, respectively.

Model Analysis
Ratio of the Bias-Aligned Samples.To showcase the resilience of different approaches to distribution shift, we have designed an experiment where we selectively remove biasaligned samples from the training dataset while keeping the test set unchanged.Our study focuses on the Waterbirds dataset, where we remove minority groups from the training set (all waterbirds in a land background and vice versa).
The worst-group accuracy on the test set is reported across epochs in Fig. 3a.Our results indicate that CFix outperforms BPA on minority groups not present in the training distribution.This provides evidence that the proposed structural loss enhances the overall worst-case generalization performance.
On the Effects of Pre-Training.The feature extractor F, used for clustering and backbone initialization, is a crucial component in our pipeline.To verify the effectiveness of our approach on CelebA while altering the backbone, ResNet-18 pre-trained on ImageNet has been employed to initialize F. We conducted identical experiments for the best and worst target attributes as specified in Tab. 1.The outcomes illustrated in Fig. 3b indicate minimal changes in the overall performance with a slight improvement when utilizing the latter setting.These results suggest that it is possible to leverage bias-aligned signals without requiring an Empirical . Visualization of the class activation maps generated by Grad-CAM [38], Grad-CAM++ [8], Ablation-CAM [34], HiResCAM [13], and LayerCam [22] for the ERM and the proposed debiased approach targeting the Wearing Necklace attribute.
Risk Minimization (ERM) model for each target.Notably, we employed the same self-supervised encoder F to perform clustering for all CelebA experiments, while others trained a separate ERM model each target.Refer to the supplementary material for ERM pre-training experiments.Feature Space Visualization.In Fig. 3c and Fig. 3d, we present visualizations of UMAP [29] projections on the CelebA dataset for the "Wearing Necklace" classification.Specifically, we visualize only negative examples (Wearing Necklace = false) to effectively show the feature space at the initial and final epoch using CFix.Our observations suggest that our proposed model successfully achieves a smoother feature space within the same class, mitigating the presence of large clusters caused by shortcuts solutions.Other methods aim to mix the bias attributes in the feature space to improve worst-case generalization.In contrast, our findings advocate that good worst-group and average generalization properties can be achieved even when the bias attributes are separable in the feature space.Therefore, unlearning the bias attribute is not always necessary to avoid shortcuts and achieve good model performance.Model Explainability.The experiments conducted in Fig. 4 explore the interpretability of the proposed debiased approach in contrast to the classical Empirical Risk Minimization (ERM) method.Specifically, we utilize several explainability techniques to visualize the class activation maps of Wearing Necklace target on CelebA.For instance, the ERM method emphasizes features not correlated with the target attribute, while CFix prioritizes the regions more indicative of the regions of interest.These results suggest that our method mitigates unintended solutions, providing more meaningful explanations of the decision-making process.

Conclusion
Mitigating model shortcuts without directly observing bias attributes is a challenging and relatively unexplored task in achieving bias removal in deep networks.Previous research has attempted to address this problem by modifying the target objective using pseudo-groups identified through ERM pre-training.However, our empirical findings indicate that this approach could be suboptimal in real-world scenarios.Our key contribution is the recognition that empirical cluster error can serve as a proxy for identifying samples likely to be affected by the inductive bias of deep networks.By leveraging this insight, CFix effectively upweights such samples, improving the worst-case and average generalization for protected groups across multiple standard benchmarks.Our study demonstrates that ClusterFix and the insights gained from experimental results offer a robust foundation for advancing worst-case generalization and algorithmic fairness without relying on demographic data.

Broader Impact and Limitations
It is crucial to prioritize fairness above performance and always be aware of potential bias, especially in fields such as healthcare and facial recognition.Our novel training approach enhance the worst-case performance for underrepresented groups without relying on their information.Further research should validate whether the proposed approach is valid in various real-world applications beyond the datasets considered in our paper.

Figure 1 .
Figure1.An illustrative scenario depicting dataset partitions biased towards gender attributes.We propose that challenging clusters for classification may leverage significant samples that do not possess the same protected attribute.We emphasize the need to pay closer attention to such distributions in order to address and mitigate model shortcuts.

Figure 3 .
Figure 3. Robustness evaluation of models trained without critical samples (a) and the effectiveness of CFix backbone pre-training on unbiased accuracy measurement (b).(c) and (d) are UMAP projections visualizing CFix feature space (Wearing Necklace = false) at the initial and final epoch.Blue and orange colors represent male and female gender values, respectively.