Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions / Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - In: INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION. - ISSN 1433-2833. - 25:3(2022), pp. 207-217. [10.1007/s10032-022-00401-y]

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Cascianelli, Silvia;Cornia, Marcella;Baraldi, Lorenzo;Cucchiara, Rita

2022

Abstract

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Rivista
	
				INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
			
	N° del Volume
	
				25
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				207
			
	Pagina finale
	
				217
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s10032-022-00401-y
			
	Codice WoS
	
				WOS:000792581000001
			
	Codice Scopus
	
				2-s2.0-85129706484
			
	Citazione
	
				Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions / Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - In: INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION. - ISSN 1433-2833. - 25:3(2022), pp. 207-217. [10.1007/s10032-022-00401-y]
			
	Tutti gli autori
	
						Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
2208.08109.pdf Open Access dal 01/06/2023 Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 2.98 MB Formato Adobe PDF Visualizza/Apri	2.98 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1272297

Citazioni

ND

13

11

social impact