As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes - ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2. Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller, MobileTx, Geinimi. In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.

EC2: Ensemble Clustering & Classification for predicting Android malware families / Chakraborty, Tanmoy; Pierazzi, Fabio; Subrahmanian, V. S.. - In: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. - ISSN 1545-5971. - 17:2(2020), pp. 262-277. [10.1109/TDSC.2017.2739145]

EC2: Ensemble Clustering & Classification for predicting Android malware families

Fabio Pierazzi
Methodology
;
2020

Abstract

As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes - ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2. Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller, MobileTx, Geinimi. In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.
2020
21-ago-2017
17
2
262
277
EC2: Ensemble Clustering & Classification for predicting Android malware families / Chakraborty, Tanmoy; Pierazzi, Fabio; Subrahmanian, V. S.. - In: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. - ISSN 1545-5971. - 17:2(2020), pp. 262-277. [10.1109/TDSC.2017.2739145]
Chakraborty, Tanmoy; Pierazzi, Fabio; Subrahmanian, V. S.
File in questo prodotto:
File Dimensione Formato  
ec2.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 719.54 kB
Formato Adobe PDF
719.54 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1158833
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 76
  • ???jsp.display-item.citation.isi??? 50
social impact