Decompiler Design - Introduction |
|
Decompilation is a form of reverse engineering of computer programs. Its goal is to convert a compiled binary file into a source file. One may want to do this for several reasons, such as to understand how a program works, or to try to modify a program to enhance it or fix a bug.
Decompilation has been around for many years, probably ever since people started to compile programs from high-level languages to lower-level formats such as assembly and machine code.
Several attempts have been conducted at writing binary executable decompilers. This page has some examples.
There are even more decompilers available for managed environments that use byte-code, like Java and C#. An extensive list is available at the program transformation wiki.
In these pages we'll focus on decompilation of binary executables, or from machine code to source code, since this is much more difficult than decompiling Java byte-code or C#.
Several languages and compilers can produce machine code, including some Java, C# and Visual Basic compilers. Therefore the decompiler will have to know which language was used to compile the program, and will have to have support to generate that language. However, most of the difficult problems in decompilation appear when using less restrictive languages, namely C, Pascal or C++.
Most of the algorithms can be used for all languages, so we will mostly use examples written in C. When we show an algorithm, we'll use C or C++, since these are the most available languages on both Linux and Windows.
Regardless of the target language, a decompiler mostly deals with 3 kinds of entities:
The constant problem of a decompiler is to try to infer one or more of the three entities above from a sequence of bytes found in the binary file. In order to do this, it's useful to know how development tools (which we call 'forward engineering'tools) are used when writing a program, since that's the process we're trying to revert.
Next: Forward Engineering Tools