Binarizing Documents by Leveraging both Space and Frequency

Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

doi:10.1007/978-3-031-70543-4_1

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations.

Binarizing Documents by Leveraging both Space and Frequency / Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.. - 14806:(2024), pp. 3-22. (Intervento presentato al convegno 18th International Conference on Document Analysis and Recognition, ICDAR 2024 tenutosi a Athens, GREECE nel AUG 30-SEP 04, 2024) [10.1007/978-3-031-70543-4_1].

Binarizing Documents by Leveraging both Space and Frequency

Quattrini F.;Pippi V.;Cascianelli S.;Cucchiara R.

2024

Abstract

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del Convegno
	
				18th International Conference on Document Analysis and Recognition, ICDAR 2024
			
	Luogo del Convegno
	
				Athens, GREECE
			
	Data del Convegno
	
				AUG 30-SEP 04, 2024
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-70543-4_1
			
	Codice WoS
	
				WOS:001336394400001
			
	Codice Scopus
	
				2-s2.0-85204649009
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				14806
			
	Pagina iniziale
	
				3
			
	Pagina finale
	
				22
			
	Tutti gli autori
	
						Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.
					
	Citazione
	
				Binarizing Documents by Leveraging both Space and Frequency / Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.. - 14806:(2024), pp. 3-22. (Intervento presentato al  convegno 18th International Conference on Document Analysis and Recognition, ICDAR 2024 tenutosi a Athens, GREECE nel AUG 30-SEP 04, 2024) [10.1007/978-3-031-70543-4_1].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1363929

Citazioni

ND

0

0

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Binarizing Documents by Leveraging both Space and Frequency

Quattrini F.;Pippi V.;Cascianelli S.;Cucchiara R.

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)