Annual Computer Security Applications Conference (ACSAC) 2018

Full Program »

Using Loops For Malware Classification Resilient to Feature-unaware Perturbations

In the past few years, both the industry and the academic com- munities have developed several approaches to detect malicious Android apps. State-of-the-art research approaches achieve very high accuracy when performing malware detection on existing datasets. These approaches perform their malware classification tasks in an “offline” scenario; where malware authors cannot learn from and adapt their malicious apps to these systems. In real-world deployments, however, adversaries get feedback about whether their app was detected, and can react accordingly by transforming their code until they are able to influence a classification. In this work, we propose a new approach for detecting Android malware that is designed to be resilient to feature-unaware pertur- bations without retraining. Our work builds on two key ideas. First, we consider only a subset of the codebase of a given app, both for precision and performance aspects. For this paper, our implemen- tation focuses exclusively on the loops contained in a given app. We hypothesize, and empirically verify, that the code contained in apps’ loops is enough to precisely detect malware. This provides the additional benefits of being less prone to noise and errors, and being more performant. The second idea is to build a feature space by extracting a set of labels for each loop, and by then considering each unique combina- tion of these labels as a different feature: The combinatorial nature of this feature space makes it prohibitively difficult for an attacker to influence our feature vector and avoid detection, without access to the specific model used for classification. We assembled these techniques into a prototype, called LoopMC, which can locate loops in applications, extract features, and perform classification, without requiring source code. We used LoopMC to classify about 20,000 benign and malicious applications. While focusing on a smaller portion of the program may seem counter- intuitive, the results of these experiments are surprising: our system achieves a classification accuracy of 99.3% and 99.1% for the Mal- ware Genome Project and VirusShare datasets, which outperforms previous approaches. We also evaluated LoopMC, along with the related work, in the context of various evasion techniques, and show that our system is more resilient to evasion.

Aravind Machiry
UC Santa Barbara
United States

Nilo Redini
UC Santa Barbara
United States

Eric Gustafson
UC Santa Barbara
United States

Yanick Fratantonio
EURECOM
France

Yung Ryn Choe
Sandia National Laboratories
United States

Christopher Kruegel
UC Santa Barbara
United States

Giovanni Vigna
UC Santa Barbara
United States

 



Powered by OpenConf®
Copyright©2002-2018 Zakon Group LLC