Full Program »
Practical Applications of Bloom filters to the NIST RDS, hard drive triage, and data mining.
Paul Farrell
Naval Postgraduate School
United States
Simson Garfinkel
Naval Postgraduate School
United States
Abstract:
Much effort has been expended in recent years to create large sets of
hash codes from known files. Distributing these sets has become more
difficult as these sets grow larger. Meanwhile the value of these sets
for eliminating the need to analyze ``known goods'' has decreased
as hard drives have dramatically increased in storage
capacity.
This paper evaluates the use of Bloom filters (BFs) to distribute the
National Software Reference Library's (NSRL) Reference Data Set (RDS)
version 2.19, with 13 million SHA-1 hashes. We present an open source
reference BF implementation and validate it against a large collection
of disk images. We discuss the tuning of the filters and discuss how
they can be used to enable new forensic functionality, including watch
lists and cross-drive analysis. We conclude by showing how BFs can improve the
usefulness of hash sets by allowing them to be used routinely for
rapid profiling of hard drives.
