Identifying the topics covered in a corpus is one of the central issues in automatic text analysis. The objective of our paper is to contribute to the comparative analysis of different methods. In particular, we compare the results obtained through the use of the most common methods for topic identification, applied to the same corpus. The analysis is performed on a large original textual database created from an e-mobility newsletter. To compare the results between the methods, we refer to two criteria. First of all, the semantic consistency of the different models is evaluated by applying the UMass score and Pointwise mutual information. Secondly, the degree of association between the topics identified by the different models is processed using a heat-map and Cramer's V.
What do we learn by applying multiple methods in topic detection? A comparative analysis on a large online dataset about mobility electrification / Alboni, Fabrizio; Russo, Margherita; Pavone, Pasquale. - 1:(2022), pp. 446-453. (Intervento presentato al convegno SIS 2022 – 51st Scientific Meeting of the Italian Statistical Society tenutosi a Caserta nel 22-24 giugno 2022).
What do we learn by applying multiple methods in topic detection? A comparative analysis on a large online dataset about mobility electrification
Fabrizio Alboni;Margherita Russo;Pasquale Pavone
2022
Abstract
Identifying the topics covered in a corpus is one of the central issues in automatic text analysis. The objective of our paper is to contribute to the comparative analysis of different methods. In particular, we compare the results obtained through the use of the most common methods for topic identification, applied to the same corpus. The analysis is performed on a large original textual database created from an e-mobility newsletter. To compare the results between the methods, we refer to two criteria. First of all, the semantic consistency of the different models is evaluated by applying the UMass score and Pointwise mutual information. Secondly, the degree of association between the topics identified by the different models is processed using a heat-map and Cramer's V.File | Dimensione | Formato | |
---|---|---|---|
What do we Learn SIS 2022.pdf
Open access
Tipologia:
Versione pubblicata dall'editore
Dimensione
6.91 MB
Formato
Adobe PDF
|
6.91 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris