CANCELLED: Tutorial T4 – Acquisition and Analysis of Large Scale Network Data V.4

Dr. John McHugh, Dalhousie University

Tuesday, December 9th, Full Day

Detecting malicious activity in network traffic is greatly complicated by the large amounts of noise, junk, and other questionable traffic that can serve as cover for these activities. With the advent of low cost mass storage devices and inexpensive computer memory, it has become possible to collect and analyze large amounts of network data covering periods of weeks, months, or even years. This tutorial will present techniques for collecting and analyzing such data, both from network flow data that can be obtained from many routers and other flow connectors, derived from packet header data and from packet data such as that collected by TCPDump, etc., or constructed from application logs. This version of the course will consist of a half day of lecture that will introduce the core tools, followed by a half day of hands on analysis using a data set to be provided.

Because of the quantity of the data involved, we develop techniques, based on filtering of the recorded data stream, for identifying groups of source or destination addresses of interest and extracting the raw data associated with them. The address groups can be represented as sets or multisets (bags) and used to refine the analysis. For example, the set of addresses within a local network that appear as source addresses for outgoing traffic in a given time interval approximates the currently active population of the local network. These can be used to partition incoming traffic into that which might be legitimate and that which is probably not since it is not addressed to active systems. Further analysis of the questionable traffic develops smaller partitions that can be identified as scanners, DDoS backscatter, etc. based on flag combinations and packet statistics. Traffic to and from hosts whose sources appear in both partitions can be examined for evidence that its destinations in the active set have been compromised. The analysis can also be used to characterize normal traffic for a customer network and to serve as a basis for identifying anomalous traffic that may warrant further examination.


  1. Introduction
  2. Overview of the toolset and core tools
  3. Guided examples
  4. Directed Studies
  5. Outbrief


General familiarity with IP network protocols. Elementary familiarity with simple statistical measures. The tutorial will consist of morning lectures followed by a guided "hands on" session in the afternoon.

We will have a limited number of workstations with the necessary tools and data installed, however, students are encouraged to bring their own laptops if possible as maximum benefit will be obtained by doing the exercises on an individual basis. The tools can be loaded from and will work on most varieties of Unix, including Linux, Mac OS X, Solaris, OpenBSD, etc. They should work on a VMware or similar Unix installation under Windows, but do not currently run as native Windows applications. Students registering for the course are encouraged to load, build, and test the tools prior to the tutorial. The data to be for the exercises will be available for download by mid November as well as on USB disks and DVDs at the tutorial. We anticipate using 5-10GB of data. Past experience shows that fast USB drives provide adequate performance in most cases if laptop disk space is a problem. The instructor will be available to help with any installation problems the evening before the tutorial and can be contacted by registered attendees for help prior to that time.

About the Instructor

Dr. John McHugh is the Canadian Research Chair in Privacy and Security at Dalhousie University in Halifax, NS. His research interests include network data analysis, visualization of network behaviors, and related aspects of computer security. He regularly teaches a semester course in the network data analysis and intrusion detection. He is one of the few external users and developers of the SiLK analysis suite for NetFlow analysis and has published extensively in the field. Prior to joining Dalhousie, he was a senior member of the technical staff with the CERT Situational Awareness Team, where he did research in survivability, network security, and intrusion detection. He was a professor and former chairman of the Computer Science Department at Portland State University in Portland, Oregon. His research interests include computer security, software engineering, and programming languages. He has previously taught at The University of North Carolina and at Duke University. He was the architect of the Gypsy code optimizer and the Gypsy Covert Channel Analysis tool. Dr. McHugh received his PhD degree in computer science from the University of Texas at Austin. He has a MS degree in computer science from the University of Maryland, and a BS degree in physics from Duke University.