Full Program »

Boosting Neural Networks to Decompile Optimized Binaries

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language· (HPL) (e.g., C/C++). It is a core technology in software analysis (e.g., vulnerability discovery and malware analysis), especially in analyzing commercial software whose source code is unavailable. The design and development of conventional rule-based decompilers are labor-intensive and time-consuming. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process into a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, the state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose NeurDP, a novel learning-based approach that target compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. We evaluate NeurDP on datasets containing various type of statements. Evaluation results show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.

Ying Cao
Institute of Information Engineering, Chinese Academy of Sciences

Ruigang Liang
Institute of Information Engineering, Chinese Academy of Sciences

Kai Chen
Institute of Information Engineering, Chinese Academy of Sciences

Peiwei Hu
Institute of Information Engineering, Chinese Academy of Sciences

Paper (ACM DL)

Slides