The analysis of multiple datasets on users’ behaviors opens interesting information fusion possibilities and, at the same time, creates a potential for re-identification and de-anonymization of users’ data. On the one hand, this kind of approaches can breach users’ privacy despite anonymization. On the other hand, combining different datasets is a key enabler for advanced context-awareness in that information from multiple sources can complement and enrich each other. In this work we analyze different anonymized mobility datasets in the direction of highlighting re-identification and information fusion possibilities. In particular we focus on call detail record (CDR) datasets released by mobile telecom operators and datasets comprising geo-localized messages released by social network sites. Results shows that: (1) in line with previous findings, few (about 4) data points are enough to uniquely pin point the majority (90 %) of the users, (2) more than 20 % of CDR users have a single social network user exhibiting a number of matching data points. We speculate that these two users might be the same person. (3) We derive an estimate of the probability of two users begin the same person given the number of data points they have in common, and estimate that for 3 % of the social network users we can find a CDR user very likely (>90 % probability) to be the same person.

Re-identification and information fusion between anonymized CDR and social network data / Cecaj, Alket; Mamei, Marco; Zambonelli, Franco. - In: JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING. - ISSN 1868-5137. - 7:1(2016), pp. 83-96. [10.1007/s12652-015-0303-x]

Re-identification and information fusion between anonymized CDR and social network data

CECAJ, ALKET;MAMEI, Marco;ZAMBONELLI, Franco
2016

Abstract

The analysis of multiple datasets on users’ behaviors opens interesting information fusion possibilities and, at the same time, creates a potential for re-identification and de-anonymization of users’ data. On the one hand, this kind of approaches can breach users’ privacy despite anonymization. On the other hand, combining different datasets is a key enabler for advanced context-awareness in that information from multiple sources can complement and enrich each other. In this work we analyze different anonymized mobility datasets in the direction of highlighting re-identification and information fusion possibilities. In particular we focus on call detail record (CDR) datasets released by mobile telecom operators and datasets comprising geo-localized messages released by social network sites. Results shows that: (1) in line with previous findings, few (about 4) data points are enough to uniquely pin point the majority (90 %) of the users, (2) more than 20 % of CDR users have a single social network user exhibiting a number of matching data points. We speculate that these two users might be the same person. (3) We derive an estimate of the probability of two users begin the same person given the number of data points they have in common, and estimate that for 3 % of the social network users we can find a CDR user very likely (>90 % probability) to be the same person.
2016
7
1
83
96
Re-identification and information fusion between anonymized CDR and social network data / Cecaj, Alket; Mamei, Marco; Zambonelli, Franco. - In: JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING. - ISSN 1868-5137. - 7:1(2016), pp. 83-96. [10.1007/s12652-015-0303-x]
Cecaj, Alket; Mamei, Marco; Zambonelli, Franco
File in questo prodotto:
File Dimensione Formato  
permobyj.pdf

Accesso riservato

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 3.13 MB
Formato Adobe PDF
3.13 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1116633
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 33
  • ???jsp.display-item.citation.isi??? 28
social impact