Full Program »
T4: Practical Formal Methods for the Analysis of Executable Code
Tuesday, 5 December 2017
08:30 - 12:00
Several major classes of security analysis have to be performed on raw executable files, such as vulnerability analysis of mobile code or commercial off-the-shelf, deobfuscation or malware inspection. These analysis are highly challenging, due to the very low-level and intricate nature of binary code, and there is a clear need for more sophisticated and automated tools than currently available syntactic and dynamic approaches. On the other hand, source-level program analysis and formal methods have made tremendous progress in the past decade, and they are now an industrial reality for safety-critical applications. The next big challenge is to lift them to tackle security for non-critical systems.
This course aims to present the current state-of-the-art in formal methods for binary-level security analysis, and especially how semantic analysis coming from source-level safety analysis can be adapted to the context of binary-level security analysis. Especially, we will:
- Recall the benefits & challenges offered by binary code analysis;
- Review the new challenges brought to formal methods by binary code analysis;
- Present promising state-of-the-art techniques (Symbolic Execution, Abstract Interpretation) together with illustrating examples and discussions on limitations – including implementation issues and traps;
- Describe early achievements obtained by mitigating these limitiations through well-chosen combinations of techniques;
Finally, we will conclude by discussing general strengths and weaknesses of such formal approaches for security, and give directions for improvements. Through the course, students will get a wide panorama of a fast growing but complicated field, in order to understand better its promises, limits and fallpits – including anecdots on case-studies and implementation issues. Existing binary-level analysis tools can be extremely powerful in the hands of advanced users, this course will set up the stage so that attendees can become advanced users.
- Basic notions of low-level programming and/or formal methods and/or compilation.
1. Introduction [20 minutes]
The introduction starts with an overview of the challenges and benefits of analyses for binary code. Then we will show an overview of formal methods for binary-level analysis. In particular we will tackle the following topics:
- Specific examples of formal methods to show what they are;
- What are the specific areas of interests for the application of formal methods to analyze binary code;
- What are the specific challenges of tailoring known formal methods to binary code.
2. Formal methods for binary code: an overview --- [105 minutes]
The need for an Intermediate Representation [20 minutes] Semantics of binary code can be very involved. In order to make the analyses portable, there is a clear need to adopt an Intermediate Representation (IR) over which the analyses will be made. This IR in turn needs have a precise semantics and be expressive enough to both detail what happens at low-level and express properties on the IR. Platforms of analysis of binary code have used REIL, DBA, Valgrind's VEX, QEMU's IR or LLVM. We will present the principles underlying these IRs as well as simplifications methods to keep the intermediate encodings compact.
Symbolic Execution --- [45 minutes] Symbolic Execution (SE) [CS13] is the method of choice when we need to explore a subset of the behaviors of the code. We will present static and dynamic versions of SE and show the challenges that SE needs to address when used at the binary level such as efficiency [Kin14], concretization/abstraction, the need for simplifications to interact efficiently with the underlying solvers, etc. Examples of use of SE will include CTF challenges and vulnerability search. We will also show the limitations of these techniques and the current open challenges.
Static analysis --- [40 minutes] It turns out that adapting sound static analyses to binary code is highly challenging [BR10] and needs dedicated tuning. We will describe some of these adaptation such as precise Control Flow Graph recovery, high-level predicates recovery [DBG16], type recovery and ways to make static analysis more robust [Kin12].
3. Early achievements --- [45 minutes]
The methods of Section II can be combined to leverage their pros while mitigating their cons. This part will show some successful combinations of analyses using formal methods, with examples taken from vulnerability search and deobfuscation.
Vulnerability. We will present how formal methods can enhance the search for specific code defects, such as by combining a scalable static analysis and dynamic symbolic execution to find Use-after-Free bugs [FMB+16]. Also we will show how one can combine fuzzing and symbolic execution [SGS+16] to find more bugs than with either one of these techniques and discuss state-of-the-art SE for vulnerability detection [ARCB16].
Deobfuscation. We will also present how formal methods have been able to tackle various aspects of deobfuscation [YD15], typically to help malware comprehension. In particular they help indentifying opaque predicates and call-stack tampering [BDM17], or devirtualizing VM-protected code [Kin12, YJWD15].
4. Conclusion --- [10 minutes]
The conclusion will summarize the current formal methods for binary code analysis and present further open challenges on the roadmap for the coming years.
[ARCB16] Thanassis Avgerinos, Alexandre Rebert, Sang Kil Cha, and David Brumley. Enhancing symbolic execution with veritesting. Commun. ACM, 59(6):93 100, 2016.
[BDM17] Sébastien Bardin, Robin David, and Jean-Yves Marion. Backwardbounded DSE: Targeting Infeasibility Questions on Obfuscated Codes. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-24, 2017, 2017.
[BR10] Gogul Balakrishnan and Thomas W. Reps. WYSINWYX: what you see is not what you execute. ACM Trans. Program. Lang. Syst., 32(6):23:1-23:84, 2010.
[CS13] Cristian Cadar and Koushik Sen. Symbolic execution for software testing: three decades later. Commun. ACM, 56(2):8290, 2013.
[DBG16] Adel Djoudi, Sébastien Bardin, and Éric Goubault. Recovering highlevel conditions from binary programs. In John S. Fitzgerald, Constance L. Heitmeyer, Stefania Gnesi, and Anna Philippou, editors, FM 2016: Formal Methods - 21st International Symposium, Limassol, Cyprus, November 9-11, 2016, Proceedings, volume 9995 of Lecture Notes in Computer Science, pages 235-253, 2016.
[FMB+16] Josselin Feist, Laurent Mounier, Sébastien Bardin, Robin David, and Marie-Laure Potet. Finding the needle in the heap: combining static analysis and dynamic symbolic execution to trigger use-afterfree. In Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering, SSPREW@ACSAC 2016, Los Angeles, California, USA, December 5-6, 2016, pages 2:1-2:12, 2016.
[Kin12] Johannes Kinder. Towards static analysis of virtualizationobfuscated binaries. In 19th Working Conference on Reverse Engineering, WCRE 2012, Kingston, ON, Canada, October 15-18, 2012, pages 61-70, 2012.
[Kin14] Johannes Kinder. Efficient symbolic execution for software testing. In Formal Methods in Computer-Aided Design, FMCAD 2014, Lausanne, Switzerland, October 21-24, 2014, page 5, 2014.
[SGS+16] Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. Driller: Augmenting fuzzing through selective symbolic execution. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016, 2016.
[YD15] Babak Yadegari and Saumya Debray. Symbolic execution of obfuscated code. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-6, 2015, pages 732-744. ACM, 2015.
[YJWD15] Babak Yadegari, Brian Johannesmeyer, Ben Whitely, and Saumya Debray. A generic approach to automatic deobfuscation of executable code. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 674-691. IEEE Computer Society, 2015.
About the Instructor:
Dr. Sébastien Bardin joined CEA LIST, France, in 2006 as a full-time researcher, with research activities centered on program analysis and automatic software verification. For a few years now, Sébastien has been interested in automating software-level security analysis by lifting formal methods developed for the safety-critical industry. More especially, he focuses on binary-level formal methods, vulnerability detection & assessment, and malware deobfuscation. He leads the "binary-level security" group at CEA LIST as well as several related research projects, and he is one of the main designers of the (open-source) BINSEC platform for binary-level code analysis. Sébastien Bardin obtained his PhD in 2005 at ENS Cachan, France, in the field of formal methods.