The ever-growing amounts of textual information coming from different sources have fostered the development of digital libraries, making digital contents readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, how much a document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property, while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.

A Document Comparison Scheme for Secure Duplicate Detection / Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo. - In: INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES. - ISSN 1432-5012. - STAMPA. - 4:(2004), pp. 223-244. [10.1007/s00799-004-0079-7]

A Document Comparison Scheme for Secure Duplicate Detection

MANDREOLI, Federica;MARTOGLIA, Riccardo;TIBERIO, Paolo
2004

Abstract

The ever-growing amounts of textual information coming from different sources have fostered the development of digital libraries, making digital contents readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, how much a document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property, while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.
2004
4
223
244
A Document Comparison Scheme for Secure Duplicate Detection / Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo. - In: INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES. - ISSN 1432-5012. - STAMPA. - 4:(2004), pp. 223-244. [10.1007/s00799-004-0079-7]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/305196
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact