We propose a novel knowledge-based technique for inter-document similarity computation, called Context Semantic Analysis (CSA). Several specialized approaches built on top of specific knowledge base (e.g. Wikipedia) exist in literature, but CSA differs from them because it is designed to be portable to any RDF knowledge base. Our technique relies on a generic RDF knowledge base (e.g. DBpedia and Wikidata) to extract from it a contextual graph and a semantic contextual vector able to represent the context of a document. We show how CSA exploits such Semantic Context Vector to compute inter-document similarity effectively. Moreover, we show how CSA can be effectively applied in the Information Retrieval domain. Experimental results show that our general technique outperforms baselines built on top of traditional methods, and achieves a performance similar to the ones built on top of specific knowledge bases.
Computing inter-document similarity with Context Semantic Analysis / Beneventano, Domenico; Benedetti, Fabio; Bergamaschi, Sonia; Simonini, Giovanni. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - 80:(2019), pp. 136-147. [10.1016/j.is.2018.02.009]
Computing inter-document similarity with Context Semantic Analysis
Domenico Beneventano
;Fabio Benedetti;Sonia Bergamaschi;Giovanni Simonini
2019
Abstract
We propose a novel knowledge-based technique for inter-document similarity computation, called Context Semantic Analysis (CSA). Several specialized approaches built on top of specific knowledge base (e.g. Wikipedia) exist in literature, but CSA differs from them because it is designed to be portable to any RDF knowledge base. Our technique relies on a generic RDF knowledge base (e.g. DBpedia and Wikidata) to extract from it a contextual graph and a semantic contextual vector able to represent the context of a document. We show how CSA exploits such Semantic Context Vector to compute inter-document similarity effectively. Moreover, we show how CSA can be effectively applied in the Information Retrieval domain. Experimental results show that our general technique outperforms baselines built on top of traditional methods, and achieves a performance similar to the ones built on top of specific knowledge bases.File | Dimensione | Formato | |
---|---|---|---|
beneventano2018.pdf
Accesso riservato
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
2.17 MB
Formato
Adobe PDF
|
2.17 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
POSTPRINT_j.is.2018.02.009.pdf
Open access
Tipologia:
AAM - Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione
1.66 MB
Formato
Adobe PDF
|
1.66 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris