The production of machine-readable data in the form of RDF datasets belonging to the Linked Open Data (LOD) Cloud is growing very fast. However, selecting relevant knowledge sources from the Cloud, assessing the quality and extracting synthetical information from a LOD source are all tasks that require a strong human effort. This paper proposes an approach for the automatic extraction of the more representative information from a LOD source and the creation of a set of indexes that enhance the description of the dataset. These indexes collect statistical information regarding the size and the complexity of the dataset (e.g. the number of instances), but also depict all the instantiated classes and the properties among them, supplying user with a synthetical view of the LOD source. The technique is fully implemented in LODeX, a tool able to deal with the performance issues of systems that expose SPARQL endpoints and to cope with the heterogeneity on the knowledge representation of RDF data. An evaluation on LODeX on a large number of endpoints (244) belonging to the LOD Cloud has been performed and the effectiveness of the index extraction process has been presented.
Online Index Extraction from Linked Open Data Sources / Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura. - ELETTRONICO. - 1267:(2014), pp. 9-20. (Intervento presentato al convegno 2nd International Workshop on Linked Data for Information Extraction, LD4IE 2014, Co-located with the 13th International Semantic Web Conference, ISWC 2014 tenutosi a Riva del Garda, Italy nel October 20, 2014).
Online Index Extraction from Linked Open Data Sources
BENEDETTI, FABIO;BERGAMASCHI, Sonia;PO, Laura
2014
Abstract
The production of machine-readable data in the form of RDF datasets belonging to the Linked Open Data (LOD) Cloud is growing very fast. However, selecting relevant knowledge sources from the Cloud, assessing the quality and extracting synthetical information from a LOD source are all tasks that require a strong human effort. This paper proposes an approach for the automatic extraction of the more representative information from a LOD source and the creation of a set of indexes that enhance the description of the dataset. These indexes collect statistical information regarding the size and the complexity of the dataset (e.g. the number of instances), but also depict all the instantiated classes and the properties among them, supplying user with a synthetical view of the LOD source. The technique is fully implemented in LODeX, a tool able to deal with the performance issues of systems that expose SPARQL endpoints and to cope with the heterogeneity on the knowledge representation of RDF data. An evaluation on LODeX on a large number of endpoints (244) belonging to the LOD Cloud has been performed and the effectiveness of the index extraction process has been presented.File | Dimensione | Formato | |
---|---|---|---|
ceur-ws-org-Vol-1267-LD4IE2014_Benedetti.pdf
Open access
Tipologia:
Versione pubblicata dall'editore
Dimensione
328.78 kB
Formato
Adobe PDF
|
328.78 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris