Prev: Introduction
Forward Engineering Tools (compilers, assemblers, linkers)
Forward engineering tools are programs that
move a program from a human-centric level of
abstraction towards a machine-centric level of abstraction.
Most programmers' main interface to the machine
is the compilation environment.
The compilation environment takes as input one or more files
in a high-level language such as C or Java, plus a number of
supporting files such as resource files and libraries, and converts
all of them into an executable for a particular execution environment,
say Linux or Windows (we are not considering higher-level program
representation formats such as UML, although they can also
be the target of decompilation).
This is accomplished through a number of steps that involve individual programs:
- Each high-level language source file is compiled into
assembly by a compiler for that high-level language.
- Each assembly language file, whether created by a compiler
or directly by the programmer, is converted into a relocatable
object file by an assembler program. The assembler is not concerned
about which language was used to write the high-level source file.
It is only concerned about which processor will execute the binary code.
This is the first step where information can be lost, since the assembler
may not see a lot of the information that is important to the programmer,
such as local variable names and types.
- Each relocatable object file is combined together with a number
of libraries that support the target execution environment by the linker.
The linker may not care about the processor that will execute the program.
It may only care about what information is required for the program to be
loaded by the target operating system. The linker may decide to remove
information from the generated binary file that it thinks will not be necessary to execute the program.
As one can see, at each step some information that was vital to the
programmer when he wrote the program is removed from the output of each
tool since it's not necessary to the final execution of the program.
What's worse is that the programmer himself may instruct each
tool to generate or remove valuable information.
When using a compiler, the user may decide to:
- generate additional information to improve the debuggability
of the code (the -g command line option of Unix compilers is used for this purpose)
- generate code that is more difficult to understand for humans,
but is better executed by processors; that is, to generate optimized code through
the -O1, -O2 or higher command line options. Optimizing compilers, through a
number of transformations they perform on the generated instructions, make
the final code less readable even when there is debugging information present
in the final file and the original source is available and inspected through a debugger.
Debugging optimized code is a worthy area of research in its own right,
and will not be considered in this document, although many of the techniques
can be applied to a 'de-optimizing debugger'.
When using any of the other tools, the user may also affect the operation of a decompiler,
for example by instructing the linker to remove any symbolic information from the binary file.
From this point on, any tool that we can use to understand
the program can be considered a reverse engineering tool.
Next: Reverse Engineering Tools