Nowadays there is a huge and growing variety of digital data. Despite the obvious relevance for the humanities and the social sciences, these massive quantities of data, usually defined as “big data”, are mainly selected and ana- lyzed using computer science and statistics. The paper proposes a theoretical and practical approach to the analysis of large quantities of data within the field of semiotic analysis. The main claim is that semiotics should dialogue with IT and statistics, that are essential to deal with the vastness and continuous variability of data. In particular, machine learning might become really useful from a semiotic perspective. In this work, we use a machine learning technique that is used in Natural Language Processing (NLP), to create a vector space based on probabilities of co–occurrences of words. In a distributional semantics perspective, this space is interpreted as a representation of semantic relations among words. We present then two directions in which we could intend the joint effort of semiotics and machine learning. In the first case, we propose a case study of semiotics–driven machine learning, in which we create a dataset starting from a semiotic analysis. In the second case, we present an example of data–driven semiotics, were the semiotic tools are used on an existing dataset, that was not build with semiotic scopes. The two directions have not to be intended as a dichotomy but instead as a part of a joint effort where semiotics interacts with machine learning and machine learning interacts with qualitative analysis.

Data–driven Semiotics and Semiotics–driven Machine Learning / Sanna, Leonardo. - In: LEXIA. - ISSN 1720-5298. - 2020:33-34(2020), pp. 89-107. [10.4399/97888255354266]

Data–driven Semiotics and Semiotics–driven Machine Learning

Leonardo Sanna
Writing – Review & Editing
2020

Abstract

Nowadays there is a huge and growing variety of digital data. Despite the obvious relevance for the humanities and the social sciences, these massive quantities of data, usually defined as “big data”, are mainly selected and ana- lyzed using computer science and statistics. The paper proposes a theoretical and practical approach to the analysis of large quantities of data within the field of semiotic analysis. The main claim is that semiotics should dialogue with IT and statistics, that are essential to deal with the vastness and continuous variability of data. In particular, machine learning might become really useful from a semiotic perspective. In this work, we use a machine learning technique that is used in Natural Language Processing (NLP), to create a vector space based on probabilities of co–occurrences of words. In a distributional semantics perspective, this space is interpreted as a representation of semantic relations among words. We present then two directions in which we could intend the joint effort of semiotics and machine learning. In the first case, we propose a case study of semiotics–driven machine learning, in which we create a dataset starting from a semiotic analysis. In the second case, we present an example of data–driven semiotics, were the semiotic tools are used on an existing dataset, that was not build with semiotic scopes. The two directions have not to be intended as a dichotomy but instead as a part of a joint effort where semiotics interacts with machine learning and machine learning interacts with qualitative analysis.
2020
giu-2020
2020
33-34
89
107
Data–driven Semiotics and Semiotics–driven Machine Learning / Sanna, Leonardo. - In: LEXIA. - ISSN 1720-5298. - 2020:33-34(2020), pp. 89-107. [10.4399/97888255354266]
Sanna, Leonardo
File in questo prodotto:
File Dimensione Formato  
Data–driven Semiotics and Semiotics–driven Machine Learning.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 2.93 MB
Formato Adobe PDF
2.93 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1220855
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact