Identifying the topics covered in a corpus is one of the central issues in automatic text analysis. The objective of our paper is to contribute to the comparative analysis of different methods. In particular, we compare the results obtained through the use of the most common methods for topic identification, applied to the same corpus. The analysis is performed on a large original textual database created from an e-mobility newsletter. To compare the results between the methods, we refer to two criteria. First of all, the semantic consistency of the different models is evaluated by applying the UMass score and Pointwise mutual information. Secondly, the degree of association between the topics identified by the different models is processed using a heat-map and Cramer's V.

What do we learn by applying multiple methods in topic detection? A comparative analysis on a large online dataset about mobility electrification / Alboni, Fabrizio; Russo, Margherita; Pavone, Pasquale. - 1:(2022), pp. 446-453. (Intervento presentato al convegno SIS 2022 – 51st Scientific Meeting of the Italian Statistical Society tenutosi a Caserta nel 22-24 giugno 2022).

What do we learn by applying multiple methods in topic detection? A comparative analysis on a large online dataset about mobility electrification

Fabrizio Alboni;Margherita Russo;Pasquale Pavone
2022

Abstract

Identifying the topics covered in a corpus is one of the central issues in automatic text analysis. The objective of our paper is to contribute to the comparative analysis of different methods. In particular, we compare the results obtained through the use of the most common methods for topic identification, applied to the same corpus. The analysis is performed on a large original textual database created from an e-mobility newsletter. To compare the results between the methods, we refer to two criteria. First of all, the semantic consistency of the different models is evaluated by applying the UMass score and Pointwise mutual information. Secondly, the degree of association between the topics identified by the different models is processed using a heat-map and Cramer's V.
2022
SIS 2022 – 51st Scientific Meeting of the Italian Statistical Society
Caserta
22-24 giugno 2022
1
446
453
Alboni, Fabrizio; Russo, Margherita; Pavone, Pasquale
What do we learn by applying multiple methods in topic detection? A comparative analysis on a large online dataset about mobility electrification / Alboni, Fabrizio; Russo, Margherita; Pavone, Pasquale. - 1:(2022), pp. 446-453. (Intervento presentato al convegno SIS 2022 – 51st Scientific Meeting of the Italian Statistical Society tenutosi a Caserta nel 22-24 giugno 2022).
File in questo prodotto:
File Dimensione Formato  
What do we Learn SIS 2022.pdf

Open access

Tipologia: Versione pubblicata dall'editore
Dimensione 6.91 MB
Formato Adobe PDF
6.91 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1288867
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact