Nowadays there is a huge and growing variety of digital data. Despite the obvious relevance for the humanities and the social sciences, these massive quantities of data, usually defined as “big data”, are mainly selected and ana- lyzed using computer science and statistics. The paper proposes a theoretical and practical approach to the analysis of large quantities of data within the field of semiotic analysis. The main claim is that semiotics should dialogue with IT and statistics, that are essential to deal with the vastness and continuous variability of data. In particular, machine learning might become really useful from a semiotic perspective. In this work, we use a machine learning technique that is used in Natural Language Processing (NLP), to create a vector space based on probabilities of co–occurrences of words. In a distributional semantics perspective, this space is interpreted as a representation of semantic relations among words. We present then two directions in which we could intend the joint effort of semiotics and machine learning. In the first case, we propose a case study of semiotics–driven machine learning, in which we create a dataset starting from a semiotic analysis. In the second case, we present an example of data–driven semiotics, were the semiotic tools are used on an existing dataset, that was not build with semiotic scopes. The two directions have not to be intended as a dichotomy but instead as a part of a joint effort where semiotics interacts with machine learning and machine learning interacts with qualitative analysis.
Data–driven Semiotics and Semiotics–driven Machine Learning / Sanna, Leonardo. - In: LEXIA. - ISSN 1720-5298. - 2020:33-34(2020), pp. 89-107. [10.4399/97888255354266]
Data–driven Semiotics and Semiotics–driven Machine Learning
Leonardo Sanna
Writing – Review & Editing
2020
Abstract
Nowadays there is a huge and growing variety of digital data. Despite the obvious relevance for the humanities and the social sciences, these massive quantities of data, usually defined as “big data”, are mainly selected and ana- lyzed using computer science and statistics. The paper proposes a theoretical and practical approach to the analysis of large quantities of data within the field of semiotic analysis. The main claim is that semiotics should dialogue with IT and statistics, that are essential to deal with the vastness and continuous variability of data. In particular, machine learning might become really useful from a semiotic perspective. In this work, we use a machine learning technique that is used in Natural Language Processing (NLP), to create a vector space based on probabilities of co–occurrences of words. In a distributional semantics perspective, this space is interpreted as a representation of semantic relations among words. We present then two directions in which we could intend the joint effort of semiotics and machine learning. In the first case, we propose a case study of semiotics–driven machine learning, in which we create a dataset starting from a semiotic analysis. In the second case, we present an example of data–driven semiotics, were the semiotic tools are used on an existing dataset, that was not build with semiotic scopes. The two directions have not to be intended as a dichotomy but instead as a part of a joint effort where semiotics interacts with machine learning and machine learning interacts with qualitative analysis.File | Dimensione | Formato | |
---|---|---|---|
Data–driven Semiotics and Semiotics–driven Machine Learning.pdf
Accesso riservato
Tipologia:
Versione pubblicata dall'editore
Dimensione
2.93 MB
Formato
Adobe PDF
|
2.93 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris