La risorsa di Italiano Standard ad alta variabilità linguistica per misurare la peculiarità di un corpus

Giovanni De Gasperis,; Pavone, Pasquale; Bolasco, Sergio

In the automatic analysis of texts, the added value due to the availability of statisticallinguistic resources is indisputable, both for the grammatical tagging of the forms of a corpus, and for the extraction of contents according to their over / under use with respect to the occurrences of a frequency lexicon for identifying the peculiar language. To this end, a corpus is built that is able to estimate the frequency of the so-called Italian Standard as a set of various linguistic typologies. This resource, usable in the TaLTaC software, is of such size as to lend itself to multiple use, both as a whole and in its individual types, each measurable in itself. The first part of the work describes the composition of the lexicon obtained from the corpus. In the second, the resource is tested with respect to a collection of tweets on Russia's war in Ukraine, measuring its specific thematic peculiarity

Nell’analisi automatica dei testi è indiscutibile il valore aggiunto dovuto alla disponibilità di risorse statistico-linguistiche, sia per il tagging grammaticale delle forme di un corpus, sia per l’estrazione di contenuti in funzione del loro sovra/sotto uso rispetto alle occorrenze di un lessico di frequenza per l’individuazione del linguaggio peculiare. A tal fine si costruisce un corpus in grado di stimare la frequenza del cosiddetto Italiano Standard come insieme di varie tipologie linguistiche. Questa risorsa, utilizzabile nel software TaLTaC, è di dimensioni tali da prestarsi a un utilizzo plurimo, sia nel suo insieme, sia nelle sue singole tipologie, ciascuna misurabile di per sé. Nella prima parte del lavoro si descrive la composizione del lessico ottenuto dal corpus. Nella seconda si sperimenta la risorsa rispetto a una raccolta di tweets sulla guerra della Russia in Ucraina, misurandone la peculiarità tematica specifica

La risorsa di Italiano Standard ad alta variabilità linguistica per misurare la peculiarità di un corpus / De Gasperis, G., Pavone, P., Bolasco, S.. - 1:(2022), pp. 274-281. (JADT 2022 Napoli 06-08 luglio 2022).

La risorsa di Italiano Standard ad alta variabilità linguistica per misurare la peculiarità di un corpus

Giovanni De Gasperis;Pasquale Pavone;Sergio Bolasco

2022

Abstract

In the automatic analysis of texts, the added value due to the availability of statisticallinguistic resources is indisputable, both for the grammatical tagging of the forms of a corpus, and for the extraction of contents according to their over / under use with respect to the occurrences of a frequency lexicon for identifying the peculiar language. To this end, a corpus is built that is able to estimate the frequency of the so-called Italian Standard as a set of various linguistic typologies. This resource, usable in the TaLTaC software, is of such size as to lend itself to multiple use, both as a whole and in its individual types, each measurable in itself. The first part of the work describes the composition of the lexicon obtained from the corpus. In the second, the resource is tested with respect to a collection of tweets on Russia's war in Ukraine, measuring its specific thematic peculiarity

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Titolo del Convegno
	
				JADT 2022
			
	Luogo del Convegno
	
				Napoli
			
	Data del Convegno
	
				06-08 luglio 2022
			
	N° del Volume
	
				1
			
	Pagina iniziale
	
				274
			
	Pagina finale
	
				281
			
	Tutti gli autori
	
						De Gasperis, Giovanni; Pavone, Pasquale; Bolasco, Sergio
					
	Citazione
	
				La risorsa di Italiano Standard ad alta variabilità linguistica per misurare la peculiarità di un corpus / De Gasperis, G., Pavone, P., Bolasco, S.. - 1:(2022), pp. 274-281. (JADT 2022 Napoli 06-08 luglio 2022).
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
Paper jadt 2022.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Dimensione 2.31 MB Formato Adobe PDF Visualizza/Apri	2.31 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris