Full Program »
Make Data Reliable : An Explanation-powered Cleaning on Malware Dataset Against Backdoor Poisoning Attacks
Machine learning (ML) based Malware classification provides excellent performance and has been deployed in various real-world applications. Training for malware classification often relies on crowdsourced threat feeds, which exposes a natural attack injection point. Considering a real-world threat model for backdoor poisoning attacks on a malware dataset, because attackers are generally considered to have no control over the sample-labeling process, they conduct a clean-label attack, a more realistic scenario, by generating backdoored benign binaries that will be disseminated through threat intelligence platforms and poison the datasets for downstream malware classifiers. To avoid the threat of backdoor poisoned datasets, we propose an explanation-powered defense methodology called make data reliable (MDR), which is a general and effective mitigation to ensure the reliability of datasets by removing backdoored samples. We use a surrogate model and explanation tool Shapley Additive exPlanations (SHAP) to filter suspicious samples, then perform watermark identification based on the filtered suspicious samples, and finally remove samples with the identified watermark to construct a reliable dataset. We conduct extensive experiments on two typical datasets that were manually poisoned using different attack strategies. Experimental results show that the MDR achieves backdoored samples removal rate greater than 99.0% for different datasets and attack conditions, while maintaining an extremely low false positive rate of less than 0.1%. Furthermore, to confirm the generality of MDR, we use different models to perform a model-agnostic evaluation. The results show that, MDR is a general methodology that does not rely on any specific model.