Synthesizing test data for fraud detection systems

Emilie Lundin, Hakan Kvarnstrom and Erland Jonsson
Chalmers University of Technology

This paper reports an experiment aimed at generating synthetic test data for fraud detection in an IP based video-on-demand service. The data generation verifies a methodology previously developed by the present authors that ensures that important statistical properties of the authentic data are preserved by using authentic normal data and fraud as a seed for generating synthetic data. This enables us to create realistic behavior profiles for users and attackers. The data can also be used to train the fraud detection system itself, thus creating the necessary adaptation of the system to a specific environment. Here we aim to verify the usability and applicability of the synthetic data, by using them to train a fraud detection system. The system is then exposed to a set of authentic data to measure parameters such as detection capability and false alarm rate as well as to a corresponding set of synthetic data, and the results are compared.

Keywords: fraud detection, synthetic test data, user simulation, evaluation, verification

Read Paper Read Paper (in PDF)