REC - Reverse Engineering Compiler User's Manual
Home Page | User's Manual
Table of Content
Starting REC
Interactive Mode
Command Files Syntax
Theory of Operation
Output examples
List of Options
Starting REC |
---|
REC is invoked with the following command line syntax:rec [{+|-}optionname ...] exec_fileTo activate an option, precede its name with a + (plus) sign. To disable an option, precede it with a - (minus) sign. To get the list of all the options and their current value, type:rec +helpThe minimum input to REC is the binary executable file. For example:
rec file.exeIf file.exe is in one of the recognized formats, it will be read, and a file.rec will be produced using the default options, without further intervention from the user.REC can operate in three modes:
The other options are used to debug the program, or to tune its output. A complete list of the options requires an understanding of the algorithms and phases that REC performs to transform an executable file in a source file. If you don't know the meaning of one option, you can experiment by enabling it and check if the output is clearer. Note that some option is only valid if another option is enabled.
- batch mode : by default, the user must provide an executable or command file name when invoking REC. This file will be opened and analyzed, and an output file with the same name as the input file and extension .rec is produced, without further intervention by the user.
- full screen interactive mode: in this mode, the user can interactively analyze the input file by disassembling or decompiling individual procedures. The user has also access to an hexadecimal viewer, and he or she can view some of the data that REC uses internally, such as the list of strings, labels, procedures etc. REC enters interactive mode when invoked from the command line with the +interactive option.
- HTML generation mode: in this mode REC reads the standard input for commands, and generates an HTML page as the result of each command typed. This mode is used on UNIX to allow a web browser like Netscape to act as the user interface of the decompiler. A proxy program is needed to translate the browser's requests into REC's standard input commands. Check the HTTP Server setup page for a description of how to use this mode. REC uses HTML generation mode when invoked from the command line with the +html option.
The same set of options is available regardless of the host/target combination.
Interactive Mode |
Interactive mode is used to analyze the program being decompiled. This mode is useful to access the hexadecimal viewer, and to inspect many of the internal lists maintained by REC, such as the strings list, the labels list, etc.
To use REC in interactive mode, the user must invoke it with the following command line:
rec +interactive file.exeREC will start analyzing file.exe to find which area contains strings, code and data. It will also build the list of labels and branches, and then will try to build a list of the procedures contained in the program.
After this phase, the main menu will be presented:
Reverse Engineering Compiler 1.4 (C) Giampiero Caprino (Nov. 15 1998)
r : show regions
d : dump regions
l : show labels
b : show branches
j : show jump tables
s : show strings
y : show symbols
p : show procedures
o : show options
D : hexdump file
Q : quit programREC's user interface is based on a simple list browser. The user can type the following keys while in the list browser:
- Up arrow or BS key : moves the cursor one line up
- Down arrow or Enter key : moves the cursor one line down
- Page Up or Ctrl-B key : shows the previous page
- Page Down or Ctrl-F key : shows the next page
- Right arrow when cursor is on a highlighted word: executes the command associated with the word
- If there is a menu, typing any highlighted letter from the menu executes the command associated with the letter
- Left arrow or 'Q' or ESCape key exits the current screen and returns to the previous screen
- The exclamation mark '!' is used to request the evaluation of numeric expressions
- The forward slash '/' character is used to search a string in the current list. The question mark '?' character searches a string bacwards. The 'n' character repeats the last search in the same direction. The 'N' character repeats the last search in the opposite direction.
Region List
The region list shows how the input file is organized. Structured files formats, like COFF and ELF have separate areas for code, data and auxiliary information. The region list shows which area REC will consider for decompilation (marked with the text type), and which areas will be searched for ASCII strings (marked with the data type).
The user can force REC to consider a file region to be text or data via the command file region: command.Labels List
The labels list shows all the addresses that are the destination of a branch or call instruction. This list is used when building the procedure list. If REC incorrectly treats a data area as a text area, it can create labels that are not part of any text region. This usually causes an incorrect procedure list. The user can then change the region list until all incorrect labels are eliminated.Branch List
The branch list shows all the addresses that have a branch, call or return instruction. This list is used when building the procedure list. If REC incorrectly treats a data area as a text area, it can create branches whose destination is not part of any text region. This usually causes an incorrect procedure list. The user can then change the region list until all incorrect branches are eliminated.Jump Table List
The jump table list shows all those areas that may contain a table of addresses inside a text region. These are usually generated when compiling switch() statements. It is important that REC recognizes these tables because the control flow analyzer depends on this data to identify all the instructions of a procedure, and also to avoid treating data bytes as instructions.Strings List
The string list shows those portions of data regions that may have ASCII strings. These strings will then be used as parameter to functions like printf() and strcpy(), among the others.Symbols List
This list shows every symbolic name associated with addresses. These are usually names of procedures (belonging to a text region) or names of global variables (belonging to a data region). The symbol names and addresses are taken from the file's symbol table, if available. The symbol list also shows the list of imported symbols (from a types: or prototype file), and the list of user specified symbols (entered via the symbol: command in a .cmd file).Procedure List
The procedure list shows all the addresses where REC has identified a user procedure. Some of these addresses may come from the Symbols List, in which case the name of the procedure is also shown. For static functions and for files without a symbol table, the entry point of the procedure is used as its name.Options List
The option list allows the user to enable or disable each option. Some options are used to produce a better output, some to enable alternative analysis algorithms, and some enable internal debugging features.Hexdump Viewer
The hexdump viewer shows the content of the input file in hexadecimal, one page at a time. The usual cursor movement characters can be used to navigate through the dump. This mode is very useful to look at areas that REC has not recognized as code or data.
Theory of Operation |
---|
The following block diagram shows REC's interaction with the files it uses/produces:
The minimum input to REC is the binary executable file. For example:
rec file.exeIf file.exe is in one of the recognized formats, it will be read, and a file.rec will be produced using the default options, without further intervention from the user.However, since decompilation is a very difficult process, the more additional information can be provided by the user, the better the output.
For example, alternative algorithms could be selected, based on the compiler used to compile the executabile file, or based on readability or output preferences. To change any of the default options, the content of the file .recrc (rec.cfg on MSDOS and Windows) is read. Each line in this file contains an option, as if that option was entered on the command line. For example, if you always want REC to start in interactive mode and to always print numeric constants in hexadecimal, use the following lines in the .recrc file:
+interactive +hexconstThese options can be overridden by command line options. For example, to run REC in batch mode even though the .recrc has a +interactive option, invoke REC with the following command line:rec -interactive file.exeA type file is used to tell REC the name and declaration of high-level objects, like struct, union, array and functions. By providing a type file, the user can improve the readability of the generated output, because variables will have symbolic names.
This is particularly useful to specify the name, type and number of function parameters. A number of type files for several Linux and Windows system calls are provided from the download page. To use the prototype files, you either need to specify them using the types: command of a .cmd file, or by adding their pathname to the proto.lst file, and put this file in the same directory where REC is run from.
Command Files and handling unrecognized formats |
---|
The input file could be in a format not yet recognized by REC. In this case, REC has no knowledge of which areas of the file contain data, which contain code and which contain auxiliary information. In this case, REC can be given this information in an ASCII file, called a command file. In this command file, a lot more information can be provided, including predefined types, addresses of functions and configuration options. For example, REC could be invoked with the following command line:rec file.cmdwhere file.cmd has the following content:#!wrec option: +hexconst types: string.o types: stdio.o file: file.exe 0x50 0x53 region: 0x80100000 0x801009b4 0x800 data region: 0x801009b4 0x8010c1e8 0x11b4 text region: 0x8010c1e8 0x80120800 0xc888 data symbol: 0x80107fe0, 0x80108077 T CrearImage() symbol: 0x80108078, 0x801080d7 T LoadImage(char *, int, int) symbol: 0x801080d8, 0x8010813b T StoreImage() symbol: 0x8010813c, 0x801081ff T MoveImage(char *, int, int) patterns: libmips.patThe file starts with a magic-id : #!wrec. This must be on the first line. Each line contains one command followed by a colon sign (:) and by some arguments. Comments are preceded by a '#' character. The remainder of the line after the '#' is ignored.Each of the option: command sets one of REC's options. These options override those provided on the command line.
The types: commands specify one or more ELF files with STAB symbolic information. This file is read to get predefined types and function prototypes. To create a types file, you can simply use Linux' system compiler (or gcc on a Solaris system) with the -g option. For example, to let REC know the types of the functions defined in the string.h header file, you can compile the following C source with the command line "gcc -g -c string.c":
/* string.c - types defined by string.h */ char *strcmp(const char *s1, const char *s2) { } char *strncmp(const char *s1, const char *s2, int len) { } char *strcpy(char *dst, const char *src) { } char *strchr(const char *, int ch) { } ....REC will add the prototype information to the symbols specified by the symbol: commands or to those found by the patterns: command. The actual code for the compiled functions is ignored, as well as their addresses. Note that the compiler will not generate symbolic information for functions that are not defined in the file, hence the { } at the end of each function.The file: command specifies the binary file to be loaded. There should be only one file: command. After the file name, the magic argument specifies an optional identifier that must be present at the beginning of the file (magic number).
The region: commands specify the layout of the binary file. The arguments are the start and end memory address at which the code and data will be loaded into memory, and the file offset where the section starts. Note that no actual loading occurs. The addresses are only used for informational purposes (they must be correct for call statements to be meaningful). The last argument is the region type, and affects the operation performed on the content of the region. Only text regions are considered for decompilation. Data regions are scanned to find ASCII strings and generic pointers.
In the example:region: 0x80100000 0x801009b4 0x800 data start addr end addr region offset typeThe symbol: commands specify starting and ending addresses of functions, along with a symbolic name and possibly a list of parameters for the function. The ending address is optional, and can be computed by REC automatically (see later). Also the ANSI-C style prototype is optional, and actually its use is discouraged, as types should be defined in a type file (see the types: command later). It is better to simply specify that the symbol is a function by adding ( ).The patterns: commands specify one or more files containing a list of hex strings (pattern) and symbolic names. REC will search in the executable file for each pattern, and when found, it will assign the symbolic name associated with the pattern to the address where the pattern begins. The following is an example of a pattern file:
open() size: 16 A0 00 0A 24 08 00 40 01 00 00 09 24 00 00 00 00 ; lseek() size: 16 A0 00 0A 24 08 00 40 01 01 00 09 24 00 00 00 00 ; ...Each pattern can be up to 256 bytes. These patterns are sometimes called signatures in the literature. The size: option tells REC how many bytes the function occupies in the binary file. For example, you can specify a 16 bytes pattern for a 3000 bytes function.
Output Examples |
---|
When the end of the command file is reached, and/or when REC has finished analyzing the executable file, it will either enter interactive mode, or it will process the entire executable file. Currently there can be two types of output:
- If the +disasmonly option was specified, a file with the .dis extension will be produced. In this file, every region with the text attribute will be disassembled, and every region with the data attribute will be hexdumped.
- Without any option, a file with the .rec extension will be produced with a C-like representation of each procedure in each text section. The C-like representation is not perfect, and cannot be fed to a compiler to recreate the original binary. Its goal is to provide the user a better understanding of the structure of the program. The following is an example of the C-like output:
hexdump(char * fname) { unsigned char buff[16]; unsigned long offset; struct _IO_FILE* fp; struct stat st; int cnt; if(stat(fname, & st) != 0) { fp = fopen(fname, "rb"); if(fp != 0) { offset = 0; L08048867: if(st.st_size > offset) { cnt = fread( & buff, 1, 16, fp); if(cnt != 0) { dumpline( & buff, offset, cnt); offset = offset + cnt; goto L08048867; } } else { } fclose(fp); eax = 0; } else { perror(fname); eax = 1; } } else { perror(fname); eax = 1; } }Additional output files could be produced if any of the debugging options were enabled. These files are used to produce the intermediate representation of the decompiled file during different stages of the decompilation process.
Options List |
---|
The following is a list of all the options supported by REC. The options are presented in hierarchical order. This means that some options are meaningful only if the parent option has been enabled.
- +/-help
this option simply prints the list of all the options and their current value on the standard output, and then exits REC.- +/-interactive
disable/enable interactive mode. When in interactive mode, no output file is generated. However, you can see internal information (such as the label list, the branch list, the string list etc.), invoke an interactive hexdump, and decompile individual procedures in random order.- +/-html
disable/enable HTML generation mode. This mode is only useful if REC was called from a CGI script that acts as an HTTP server. REC will read commands from the standard input, and produce an HTML page after each command.- +/-silent
this option will disable the output of the trace information during the decompilation process. If this option is disabled, REC prints the current activity on the standard output.- +/-validatestr
this option enables the analysis of the input file data areas to detect ASCII strings.- +/-dfoprocs
this option is used to tell REC to only decompile procedures that can be reached from the entry point. The order used by REC is bottom-up, that is the deepest procedure (the farthest from the entry point) is decompiled first; the entry procedure is decompiled last. This allows more accurate acquisition of information such as the number and types of each procedure's parameter.- +/-locals
this option enables/disables the conversion of stack and register references to procedure arguments and local variables.- +/-rdonly
this option tries to substitute register references when the only assignment to the register is that of a formal parameter- +/-simplifyexprs
when this is enabled, processor idioms are converted in more regular expressions. For example, an instruction such as "EAX = EAX ^ EAX" is converted into the expression "EAX = 0". This helps the data flow analyzer.- +/-doblocks
this options builds the control-flow graph for each procedure. It must be enabled for the data flow analyzer (+compsets option) to work correctly.- +/-compsets
this options enables/disables the register lifetime analysis. This analysis greatly helps in elimination of register variables by the following pass. If it is disabled, most if not all the produced C expressions will use a lot of register references.- +/-compactexprs
this option enables the elimination of register temporary variables, and the creation of complex expressions. The number of output statements is greatily reduced by this option, but the complexity of each statement increases.
For example, the following instructions:EAX = 1; EAX = EAX + *EBX; PUSH(EAX); CALL 0x1000can be compacted into the following expression:L1000(1 + *EBX);+/-types
this option enables variable's type detection. Type detection is only partially implemented at this time.+/-compactifs
This option converts sequences of compare+branchcondition instructions into if-goto-else-goto statements. This is the first stage where actual C code can be produced. The following stages only try to better structure the output by using more complex C statements.+/-displaylabels
When this option is enabled, labels are always printed in the output, even if there is no goto statement to that address. This is useful to compare the C output with the disassembler output.+/-dostmts
This option enables/disables the structurization of the output in more complex C statements. Each type of C statement can also be individually enabled or disabled.+/-donullgotos
This option enables REC to remove goto statements that jump to the next (sequential) statement.+/-doifs
during the generation of C statements, the if statement is always represented as a sequence of if-goto-else-goto. This representation simplifies moving if statements around. When this option is enabled, REC tries to remove the goto statement inside the true and false blocks by sustituting code from the destination of the goto statement, or by removing the else part altogether.+/-doloops
when this option is enabled, loop analysis is used to substitute if-goto statements into while or do-while statements.+/-dowhile
enable/disable while statement detection.+/-dofor
when this option is enabled, REC tries to compact while or do-while statements into a single for statement.+/-dopackloops
this option enables the rewriting of endless while loops into do-while loops, when an if statement at the end of the while block would cause the loop to either continue or end.+/-dopackstmt
this option enables REC to compact statements, primarily if statements. For example, boolean && and || conditions are used to merge two consecutive if statements. Also, the conditional assignment operator (? :) is created when there is a sequence if(e1) v = e2; else v = e3; This option can create very good looking output.+/-doswitch
enable/disable switch statement detection+/-dosort
this option tries to reduce the depth of conditional statements by rearranging compound statements blocks in the output.+/-flag16
this option forces the i386 disassembler to work in real-mode (16-bit mode) as opposed to the default protected (32-bit) mode. This option is only valid when decompiling x86 files.+/-int16
this option specifies that integer are 16 bits instead of 32 (default). This is useful for older targets like 8086 or Macintosh's 68000. I might add more options as I add other features.
TODO List:
Things that I still need to add (I'm working on them in my spare time):
- Automatic detection of parameters. This is made more difficult by the fact that on MIPS 4 parameters are passed in register, and the other are passed on the stack.
- Automatic detection of local variables. This is not very difficult on MIPS and PowerPC, since they use a fixed stack pointer. It's more difficult for i386 and mc68k, which have a variable-size stack frame.
- Automatic detection of variable types.
- Removal of all registers from expressions.
Copyright © 1997 - 2007 Backer Street Software -- All right reserved.
Last revised on 10 Mar. 1999