DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of “fully-attentive” architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as O(n^2) where n stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public.

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform / Scribano, C.; Franchini, G.; Prato, M.; Bertogna, M.. - In: JOURNAL OF SCIENTIFIC COMPUTING. - ISSN 1573-7691. - 94:(2023), pp. 1-25. [10.1007/s10915-023-02125-5]

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

C. Scribano;G. Franchini;M. Prato;M. Bertogna

2023

Abstract

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of “fully-attentive” architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as O(n^2) where n stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Rivista
	
				JOURNAL OF SCIENTIFIC COMPUTING
			
	N° del Volume
	
				94
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				25
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s10915-023-02125-5
			
	Codice WoS
	
				WOS:000930945100004
			
	Codice Scopus
	
				2-s2.0-85147665351
			
	Citazione
	
				DCT-Former: Efficient Self-Attention with Discrete Cosine Transform / Scribano, C.; Franchini, G.; Prato, M.; Bertogna, M.. - In: JOURNAL OF SCIENTIFIC COMPUTING. - ISSN 1573-7691. - 94:(2023), pp. 1-25. [10.1007/s10915-023-02125-5]
			
	Tutti gli autori
	
						Scribano, C.; Franchini, G.; Prato, M.; Bertogna, M.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
preprint.pdf Open access Tipologia: AO - Versione originale dell'autore proposta per la pubblicazione Licenza: [IR] creative-commons Dimensione 1.17 MB Formato Adobe PDF Visualizza/Apri	1.17 MB	Adobe PDF	Visualizza/Apri
18190552.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Licenza: [IR] closed Dimensione 447.99 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	447.99 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1295556

Citazioni

ND

24

19

social impact