Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes?

The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.

Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes? / De Cesarei, A.; Cavicchi, S.; Cristadoro, G.; Lippi, M.. - In: COGNITIVE SCIENCE. - ISSN 0364-0213. - 45:6(2021), pp. 1-27. [10.1111/cogs.13009]

Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes?

De Cesarei A.;Cavicchi S.;Cristadoro G.;Lippi M.

2021

Abstract

The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Rivista
	
				COGNITIVE SCIENCE
			
	N° del Volume
	
				45
			
	Fascicolo
	
				6
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				27
			
	Codice DOI
	
				https://dx.doi.org/10.1111/cogs.13009
			
	Codice WoS
	
				WOS:000665833500011
			
	Codice Scopus
	
				2-s2.0-85108579478
			
	Codice PubMed
	
				34170027
			
	Citazione
	
				Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes? / De Cesarei, A.; Cavicchi, S.; Cristadoro, G.; Lippi, M.. - In: COGNITIVE SCIENCE. - ISSN 0364-0213. - 45:6(2021), pp. 1-27. [10.1111/cogs.13009]
			
	Tutti gli autori
	
						De Cesarei, A.; Cavicchi, S.; Cristadoro, G.; Lippi, M.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
CogSci2021.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Dimensione 1.22 MB Formato Adobe PDF Visualizza/Apri	1.22 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1251779

Citazioni

1

13

10

social impact