Generated Scalable Vector Graphics (SVG) images demand evaluation criteria tuned to their symbolic and vectorial nature – criteria that existing metrics such as FID, LPIPS, or CLIPScore fail to satisfy. In this paper, we introduce SVGauge, the first human-aligned, reference-based metric for text-to-SVG generation. SVGauge jointly measures (i) visual fidelity, obtained by extracting SigLIP image embeddings and refining them with PCA and whitening for domain alignment, and (ii) semantic consistency, captured by comparing BLIP-2-generated captions of the SVGs against the original prompts in the combined space of SBERT and TF-IDF. Evaluation on the proposed SHE benchmark shows that SVGauge attains the highest correlation with human judgments and reproduces system-level rankings of eight zero-shot LLM-based generators more faithfully than existing metrics. Our results highlight the necessity of vector-specific evaluation and provide a practical tool for benchmarking future text-to-SVG generation models.

SVGauge: Towards Human-Aligned Evaluation for SVG Generation / Zini, L., Frigieri, E., Aloscari, S., Generali, M., Dodi, L., Dosen, R., Baraldi, L.. - 16167 LNCS:(2026), pp. 181-193. (23rd International Conference on Image Analysis and Processing, ICIAP 2025 ita 2025) [10.1007/978-3-032-10185-3_15].

SVGauge: Towards Human-Aligned Evaluation for SVG Generation

Leonardo Zini
Investigation
;
Elia Frigieri
Investigation
;
Sebastiano Aloscari
Investigation
;
2026

Abstract

Generated Scalable Vector Graphics (SVG) images demand evaluation criteria tuned to their symbolic and vectorial nature – criteria that existing metrics such as FID, LPIPS, or CLIPScore fail to satisfy. In this paper, we introduce SVGauge, the first human-aligned, reference-based metric for text-to-SVG generation. SVGauge jointly measures (i) visual fidelity, obtained by extracting SigLIP image embeddings and refining them with PCA and whitening for domain alignment, and (ii) semantic consistency, captured by comparing BLIP-2-generated captions of the SVGs against the original prompts in the combined space of SBERT and TF-IDF. Evaluation on the proposed SHE benchmark shows that SVGauge attains the highest correlation with human judgments and reproduces system-level rankings of eight zero-shot LLM-based generators more faithfully than existing metrics. Our results highlight the necessity of vector-specific evaluation and provide a practical tool for benchmarking future text-to-SVG generation models.
2026
23rd International Conference on Image Analysis and Processing, ICIAP 2025
ita
2025
16167 LNCS
181
193
Zini, Leonardo; Frigieri, Elia; Aloscari, Sebastiano; Generali, Marcello; Dodi, Lorenzo; Dosen, Robert; Baraldi, Lorenzo
SVGauge: Towards Human-Aligned Evaluation for SVG Generation / Zini, L., Frigieri, E., Aloscari, S., Generali, M., Dodi, L., Dosen, R., Baraldi, L.. - 16167 LNCS:(2026), pp. 181-193. (23rd International Conference on Image Analysis and Processing, ICIAP 2025 ita 2025) [10.1007/978-3-032-10185-3_15].
File in questo prodotto:
File Dimensione Formato  
3Q9Ftt-2509.07127v1.pdf

Open access

Tipologia: VOR - Versione pubblicata dall'editore
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1412548
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact