Full Program »
TW2: Adversarial Machine Learning
Tuesday, 10 December 2019
08:30 - 12:00
Machine learning has seen a significant rate of adoption in recent years across a wide range of domains and applications. Deep learning with neural networks has become a highly popular machine learning method due to recent breakthroughs in computer vision, speech recognition, and other areas. However, machine learning algorithms can be fragile and easily fooled. For example, an attacker could add adversarial perturbations often invisible to human eye to an image to cause a deep neural network to misclassify the perturbed image. Such attacks go beyond image classification, and are effective across different neural network architectures and applications. An adversary with very weak access to a system, and little knowledge about the machine learning systems, can devise powerful attacks against such systems if he can interact with the system.
There has been a growing recognition that machine learning brings new vulnerabilities and in response to these concerns, there is an emerging body of work on adversarial machine learning. However, the community’s understanding of the nature and extent of the vulnerabilities in machine learning algorithms remains limited. This training workshop will review a broad array of these issues and techniques from both the cybersecurity and machine learning research areas. We will provide an introduction of what challenges are typically encountered and how to tackle them when building machine learning systems for security purposes. We will discuss the problems of adversarial classifier evasion, both the evasion and data poisoning attacks and the associated defensive techniques. We then describe specialized techniques for both attacking and defending deep learning techniques, and we discuss some applications in cybersecurity domain.
Finally, we look at a different attack vector in machine learning: the privacy of the data. In machine learning as a service setting the data is handed over to a third-party service provider and some result is returned. This is not always desired or possible for example due to regulatory restrictions. Several solutions have been proposed for this issue. We will discuss two of them, differential privacy and homomorphic encryption, in detail. Giving a glimpse into the cryptographic components of these systems and what their individual strengths and weaknesses are.
A basic understanding of machine learning. Programming experience is required with at least a beginner level understanding of Python.
The attendees need to bring laptops.
- Introduction to adversarial machine learning (15 min). We will first review a taxonomy of threat models and potential security and privacy attacks against machine learning algorithms. Generally, attacks against machine learning algorithms are categorized based on the effect they have on the classifier, the security violation they cause, and how specific they are.
- Evasion attacks and defenses (30 min). Evasion attacks are the most prevalent type of attack that may be encountered in adversarial settings during system operation. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware code. In the evasion settings, malicious samples are modified at test time to evade detection; that is, to be misclassified as legitimate. These attacks require no influence over the training data. We will discuss how the evasion attacks work and some potential defense mechanisms.
- Hands-on: Evasion attacks (30 min). We will use scikit-learn to train a simple SVM classifier and use the techniques discussed in the previous section to evade the classifier.
- Attacks on deep neural networks (30 min). In addition to classic machine learning, we will discuss how adversarial attacks work on deep neural networks. We will also discuss attempted defenses against adversarial examples and why is it hard to defend against adversarial attacks. Furthermore, we will describe generative adversarial networks (GANs) and their implications.
- Hands-on: Attacks on deep neural networks (45 min). Using available open source tools such as cleverhans and the adversarial-robustness-toolbox, the attendees will learn how to evaluate the robustness of neural networks w.r.t. different attacks. We will train and use pretrained models to find adversarial examples using these tools and explore what they provide in terms of defenses.
- Data poisoning attacks (30 min). Machine learning algorithms are often re-trained on data collected during operation to adapt to changes in the underlying data distribution. An attacker may poison the training data by injecting carefully designed samples to eventually compromise the whole learning process. Poisoning may thus be regarded as an adversarial contamination of the training data. We will discuss how the poisoning attacks work and some potential defense mechanisms.
- Application of Machine Learning to Security (30 min). In the past few years, many researchers have begun to apply machine learning techniques to various security problems. We will go over different applications for machine learning in a security settings. Common applications are spam detection, malware detection, intrusion detection and more. These tasks typically deal with a similar set of problems such as highly imbalanced datasets, “small” amounts of data and extracting meaningful features from the data. We will discuss approaches to deal with these challenges. On the other hand, security is a difficult area because adversaries actively manipulate training data and vary attack techniques to defeat new systems. We will discuss adversarial machine learning problems across different security applications to see if there are common problems and effective solutions, and to determine if machine learning can indeed work well in adversarial environments.
- Hands-on: Malware detection for Android apps and adversarial attacks (60 min). The attendees will be given the opportunity to apply some of the techniques discussed in the previous sections. They will build a neural network based malware detection system for android malware using keras, tensorflow and scikit-learn as the machine learning tools and various android analysis tools to extract the features from the android apps.
- Privacy preserving machine learning (30 min). In addition to the attacks against machine learning algorithms and models, there are attacks against the privacy of data and models. This is in an entirely different class of threats in machine learning where sensitive data and sensitive models should be protected. We discuss approaches such as Homomorphic Encryption and Differential Privacy that still allow to harness the power of machine learning without giving up sensitive information.
- Hands-on: Privacy preserving machine learning (60 min). Differential privacy and homomorphic encryption are two different solutions to the same problem of data privacy in machine learning. We will explore tools that provide one or the other such as TensorFlow Privacy and he-transformer. The attendees will get an understanding of drawbacks and advantages provide by these systems and how to build machine learning applications based on them.
About the Instructors
Dr. Daniel Takabi is an Associate Professor in the Department of Computer Science at Georgia State University where he directs the INformation Security and Privacy: Interdisciplinary Research and Education (INSPIRE) Lab. He received his PhD from University of Pittsburgh in 2013. His research interests span a wide range of topics in cybersecurity and privacy including secure privacy preserving machine learning, advanced access control models, insider threats, and usable security and privacy. He has published a book, three book chapters and more than 100 papers in renowned conferences and journals and is recipient of several best paper awards and best poster award. Dr. Takabi serves on organizing/ program committee of several top security conferences including ACM CCS, IEEE Security and Privacy, ACM CODASPY, and ACSAC. He is a member of IAPP, ACM, and IEEE.
Mr. Robert Podschwadt is a PhD student in computer science at Georgia State University, where his research focuses on machine learning for cybersecurity, and particularly on attacking and defending machine learning systems from adversarial examples. His Master’s thesis (2012, Hochschule der Medien, Stuttgart, Germany) presented innovations in GPU-aided machine learning with applications to energy grid management. Before starting the PhD program, Mr. Podschwadt worked for Sirrix, now Rohde & Schwarz Cybersecurity, where he designed and developed cybersecurity products for industry.