In this paper, we present a preliminary approach for automatically discovering the topics of a structured data source with respect to a reference ontology. Our technique relies on a signature, i.e., a weighted graph that summarizes the content of a source. Graph-based approaches have been already used in the literature for similar purposes. In these proposals, the weights are typically assigned using traditional information-theoretical quantities such as entropy and mutual information. Here, we propose a novel data-driven technique based on composite likelihood to estimate the weights and other main features of the graphs, making the resulting approach less sensitive to overfitting. By means of a comparison of signatures, we can easily discover the topic of a target data source with respect to a reference ontology. This task is provided by a matching algorithm that retrieves the elements common to both the graphs. To illustrate our approach, we discuss a preliminary evaluation in the form of running example.

Discovering the topics of a data source: A statistical approach? / Bergamaschi, Sonia; Ferrari, Davide; Guerra, Francesco; Simonini, Giovanni. - 1310:(2014). ((Intervento presentato al convegno Workshop on Surfacing the Deep and the Social Web, SDSW 2014, Co-located with the 13th International Semantic Web Conference, ISWC 2014 tenutosi a ita nel 2014.

Discovering the topics of a data source: A statistical approach?

BERGAMASCHI, Sonia;GUERRA, Francesco;SIMONINI, GIOVANNI
2014

Abstract

In this paper, we present a preliminary approach for automatically discovering the topics of a structured data source with respect to a reference ontology. Our technique relies on a signature, i.e., a weighted graph that summarizes the content of a source. Graph-based approaches have been already used in the literature for similar purposes. In these proposals, the weights are typically assigned using traditional information-theoretical quantities such as entropy and mutual information. Here, we propose a novel data-driven technique based on composite likelihood to estimate the weights and other main features of the graphs, making the resulting approach less sensitive to overfitting. By means of a comparison of signatures, we can easily discover the topic of a target data source with respect to a reference ontology. This task is provided by a matching algorithm that retrieves the elements common to both the graphs. To illustrate our approach, we discuss a preliminary evaluation in the form of running example.
Workshop on Surfacing the Deep and the Social Web, SDSW 2014, Co-located with the 13th International Semantic Web Conference, ISWC 2014
ita
2014
1310
Bergamaschi, Sonia; Ferrari, Davide; Guerra, Francesco; Simonini, Giovanni
Discovering the topics of a data source: A statistical approach? / Bergamaschi, Sonia; Ferrari, Davide; Guerra, Francesco; Simonini, Giovanni. - 1310:(2014). ((Intervento presentato al convegno Workshop on Surfacing the Deep and the Social Web, SDSW 2014, Co-located with the 13th International Semantic Web Conference, ISWC 2014 tenutosi a ita nel 2014.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11380/1078396
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact