Full Program »

LOBO - Evaluation of Generalization Deficiencies in Twitter Bot Classifiers

Botnets in online social networks have been an increasing problem, affecting the regular flow of discussion, attacking regular users and their posts, spamming them with irrelevant or offensive products, and even stifling adoption and user-base increase. New, or mutated botnets from existing ones are constantly being discovered by the respective platforms and researchers, as their bot-masters are actively and creatively trying to get around any form of detection. Therefore, existing detection methods are being deprecated soon after they are deployed and become apparent to the bot-masters.

In this paper, we make the argument for the need of a generalized evaluation in Twitter bot detection. We put forth a methodology to evaluate bot classifiers, by testing them on unseen bot classes. We show that this methodology is empirically robust, by using bot classes of varying sizes and characteristics and reaching similar results. We suggest that methods trained and tested on single bot classes or datasets risk not being able to generalize to new bot classes. We train one such classifier on over 200,000 data points and show that it achieves over 97\% accuracy. The data that we use to train and test this classifier are some of the largest and most varied collections of bots used in literature. We then test this theoretically sound classifier using our methodology, only to highlight that it does not generalize well to unseen bot classes. We finally discuss implications of our results, and reasons why some bot classes are easier and faster to detect than others.

Juan Echeverria
University College London
United Kingdom

Nicolas Kourtellis
Telefonica I+D
Spain

Ilias Leontiadis
Telefonica I+D
Spain

Emiliano De Cristofaro
University College London
United Kingdom

Gianluca Stringhini
University College London
United Kingdom

Shi Zhou
University College London
United Kingdom