The ever-growing amounts of textual information coming from different sources have fostered the development of digital libraries, making digital contents readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, how much a document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property, while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.
A Document Comparison Scheme for Secure Duplicate Detection / Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo. - In: INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES. - ISSN 1432-5012. - STAMPA. - 4:(2004), pp. 223-244. [10.1007/s00799-004-0079-7]
A Document Comparison Scheme for Secure Duplicate Detection
MANDREOLI, Federica;MARTOGLIA, Riccardo;TIBERIO, Paolo
2004
Abstract
The ever-growing amounts of textual information coming from different sources have fostered the development of digital libraries, making digital contents readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, how much a document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property, while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris