Full Program »

Spotlight: Malware Lead Generation at Scale

Malware is one of the key threats to online security today, with applications ranging from phishing mailers to ransomware and trojans. Due to the sheer size and variety of the malware threat, it is impractical to combat it as a whole. Instead, governments and companies have instituted teams dedicated to identifying, prioritizing, and removing specific malware families that directly affect their population or business model. The identification and prioritization of the most disconcerting malware families (known as malware hunting) is a time-consuming activity, accounting for more than 20% of the work hours of a typical threat intelligence researcher, according to our survey. To save this precious resource and amplify the team’s impact on users’ online safety we present a large-scale malware lead-generation framework. Our framework first sifts through a large malware dataset to remove known malware families, based on first and third-party threat intelligence. It then clusters the remaining malware into potentially-undiscovered families, and prioritizes them for further investigation using a score based on their potential business impact.

We evaluate our framework on 67M malware samples, to show that it is able to produce top-priority clusters with over 99% purity (i.e., homogeneity), which is higher than simpler approaches and prior work. To showcase our framework's effectiveness, we apply it to ad-fraud malware hunting on real-world data. Using our framework's output, threat intelligence researchers were able to quickly identify three large botnets that perform ad fraud.

Fabian Kaczmarczyck
Google

Bernhard Grill
Google

Luca Invernizzi
Google

Jennifer Pullman
Google

Cecilia M. Procopiuc
Google

David Tao
Google

Borbala Benko
Google

Elie Bursztein
Google

Paper (ACM DL)

Slides