Generated Scalable Vector Graphics (SVG) images demand evaluation criteria tuned to their symbolic and vectorial nature – criteria that existing metrics such as FID, LPIPS, or CLIPScore fail to satisfy. In this paper, we introduce SVGauge, the first human-aligned, reference-based metric for text-to-SVG generation. SVGauge jointly measures (i) visual fidelity, obtained by extracting SigLIP image embeddings and refining them with PCA and whitening for domain alignment, and (ii) semantic consistency, captured by comparing BLIP-2-generated captions of the SVGs against the original prompts in the combined space of SBERT and TF-IDF. Evaluation on the proposed SHE benchmark shows that SVGauge attains the highest correlation with human judgments and reproduces system-level rankings of eight zero-shot LLM-based generators more faithfully than existing metrics. Our results highlight the necessity of vector-specific evaluation and provide a practical tool for benchmarking future text-to-SVG generation models.
SVGauge: Towards Human-Aligned Evaluation for SVG Generation / Zini, L., Frigieri, E., Aloscari, S., Generali, M., Dodi, L., Dosen, R., Baraldi, L.. - 16167 LNCS:(2026), pp. 181-193. (23rd International Conference on Image Analysis and Processing, ICIAP 2025 ita 2025) [10.1007/978-3-032-10185-3_15].
SVGauge: Towards Human-Aligned Evaluation for SVG Generation
Leonardo ZiniInvestigation
;Elia FrigieriInvestigation
;Sebastiano AloscariInvestigation
;
2026
Abstract
Generated Scalable Vector Graphics (SVG) images demand evaluation criteria tuned to their symbolic and vectorial nature – criteria that existing metrics such as FID, LPIPS, or CLIPScore fail to satisfy. In this paper, we introduce SVGauge, the first human-aligned, reference-based metric for text-to-SVG generation. SVGauge jointly measures (i) visual fidelity, obtained by extracting SigLIP image embeddings and refining them with PCA and whitening for domain alignment, and (ii) semantic consistency, captured by comparing BLIP-2-generated captions of the SVGs against the original prompts in the combined space of SBERT and TF-IDF. Evaluation on the proposed SHE benchmark shows that SVGauge attains the highest correlation with human judgments and reproduces system-level rankings of eight zero-shot LLM-based generators more faithfully than existing metrics. Our results highlight the necessity of vector-specific evaluation and provide a practical tool for benchmarking future text-to-SVG generation models.| File | Dimensione | Formato | |
|---|---|---|---|
|
3Q9Ftt-2509.07127v1.pdf
Open access
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
1.02 MB
Formato
Adobe PDF
|
1.02 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris




