Decompiler Design - Creating Statements |
|
Prev: Loop Analysis
With the information we have gathered in the previous two steps (basic blocks and loops) we now have all we need to convert control flow instructions into high-level statements.
This process is called "structuring" in the literature, because we re-create the original program structure from an unstructured sequence of instructions.
Control flow instructions (branches, goto, labels) are converted into the following set of C statements:do { break; continue; } while; while() { break; continue; } for() { break; continue; } if() { } if() { } else { } return goto label
The final goal is to remove all goto and all labels.
We use a proven technique called "interval analysis". The general approach is to replace a number of basic blocks which are identified to have a known pattern into a single basic block, that is into a statement that has only one entry and only one exit. We'll do this by moving statements from the containing scope (initially there is only one scope, the entire procedure), into the scope(s) of the control statement, effectively creating a tree of basic blocks.
We start by structuring loop statements. The reason is that we want to convert goto instructions into break and continue statements before we attempt to convert conditional branches into if-else statements.
Having the list of loops in a procedure, we first identify the most nested loop (a loop that has no other loops inside). We then replace the header block of each loop with a do-while statement, and we move all blocks belonging to the loop into the do-while statement.
sort loopSet from innermost loop to outermost loop
for each loop in loopSet {
doWhile = new Statement(DoWhile)
doWhile->expr = new Expr(loop->blocks.last->lastCompareInstr)
doWhile->blocks = loop->blocks
doWhile->breakBlock = loop->blocks->postDominator
doWhile->continueBlock = loop->blocks.last
if doWhile->continueBlock->onlyStatement != If or
doWhile->continueBlock->onlyStatement.destination
!= header
doWhile->continueBlock = NULL
loop->header->lastStatement = doWhile
StructureBreakContinue(doWhile,
doWhile->continueBlock, doWhile->breakBlock)
}
StructureBreakContinue(Statement stmt, Block contBlock, Block breakBlock)
{
switch stmt.type {
case Goto:
if stmt.destination == contBlock
stmt.type =
Continue
else if stmt.destination ==
breakBlock
stmt.type =
Break
break
case If:
if stmt.elseBlocks
for each
statement in stmt.elseBlocks
StructureBreakContinue(statement, contBlock, breakBlock)
if stmt.trueBlocks
for each
statement in stmt.trueBlocks
StructureBreakContinue(statement, contBlock, breakBlock)
}
}
The above algorithm converts the following sequence:
....
....
Lheader:
do {
....
....
if(expr)
if(expr)
goto Lcontinue
continue;
....
....
if(expr)
if(expr)
goto Lbreak
break;
...
....
Lcontinue:
if(expr)
} while(expr);
goto Lheader
Lbreak:
...
As you can see, 2 goto statements and 2 labels have been removed. Note how we need to visit innermost loops first, and how we only descend the if-else statements but not do-while statements that we already discovered. This is because a break or continue statement only applies to the closest loop, not to any outer loop that may be active at that statement. goto statements that jump out of the innermost loop will remain as gotos in the final output.
Now that we have identified the basic loop statements we can improve the output by trying to eliminate even more gotos and labels, and also by creating more familiar loop constructs. After all, the do-while loop is not used as often as the more common while and for loops. These can now be created by checking a few conditions and rewriting the do-while loops that we just generated.
if(expr)
while(expr) {
goto Lbreak;
do {
....
....
} while(expr);
}
Lbreak:
....
....
Note that the break and continue points have remained the same during the conversion from do-while to while.
We can now go further and convert while loops into for loops by checking if there is an initialization and an increment statement for the same variable:
v = init_expr
while(expr comparing v) { for(v
= init_expr; expr comparing v; v = incr) {
....
Lcontinue:
v = incr // e.g. v += 1
}
}
....
Note that for loops have a different continue point than do-while and while loops, therefore the initial conversion of if-goto to if-continue statements would have failed. But now that we know the continue point for the loop, we can call StructureBreakContinue for the statements inside the for loop, so that gotos to the continue point are converted into continue statements.
Now that we have converted loop statements we can try to eliminating the remaining goto statements. These should be part of if or if-else statements, so we'll try to identify basic block configurations that allow us to convert if-goto sequences into if or if-else sequences.
The basic idea is the same as we used in structuring loops: we remove all blocks that belong to the if-goto sequence and make them children of the containing if-else statement.
StructureIfElse(Block block)
The above algorithm converts patterns like the following into
if-else
statements.
if(expr)
if(!expr) {
Note that we have to negate the condition to enter the true block (alternatively
we can swap the true and false blocks and keep the same condition). This pattern
will eliminate 2 gotos and possibly 2 labels.
A variant of the previous algorithm has no else portions. That is:
StructureIfs(Block block)
The above variant takes care of the following case:
if(expr)
if(!expr) {
Not all if-goto pairs will be converted by the above algorithms. Consider the
following pattern:
if(expr1)
The above sequence is indicative of a single if statement with two conditions.
That must be converted into a single expression concatenating the 2 conditions
with an && operator:
if(!expr1 && !expr2) {
There is no limit to the number of expressions that can be collapsed into the
same if statement, so recognizing the above pattern should be repeated as long
as we can collapse more and more expressions. Similarly, we can recognize
sequences of expressions that jump to the true block, and concatenate them using
the || operator:
if(expr1)
if(expr1 ||
The above algorithms and patterns work fairly well for specific instruction
sequences.
However their fairly simplistic approach fails miserably in presence of even the
slightest disturbance in the instruction sequence. In particular the presence of
complex expressions prevents the collapsing of if sequences. Consider the
following sequence:
MOV R1, var_1
R1 = var_1;
The sequence maps nicely into the if-or pattern. However, having a more complex
expression sequence will break the mapping:
MOV R1, var_1
R1 = var_1; // basic block 1
This is because the second basic block has multiple statements as opposed to a
single if statement. Since an expression cannot contain statements, we cannot
collapse the two compare instructions into a single if expression.
In order to be able to reduce the number of statements, thus opening the door
for further statement structuring, we need to collapse simple expressions into
more complex ones.
Data flow analysis helps in reducing the
non-control-flow statements, and opens the door to better statement
structuring.
Next: Data flow analysis
if block.succ.size != 2
return false
trueBlock = block.succ[0]
falseBlock = block.succ[1]
if trueBlock.succ.size != 1 or falseBlock.succ.size != 1
return false
if falseBlock.succ[0] != trueBlock.succ[0]
return false
ifStmt = new Statement(If)
ifStmt.expr = NegateCondition(block.lastStmt)
ifStmt.trueBlocks = trueBlock
ifStmt.falseBlocks = falseBlock
removeLastGoto(trueBlock, trueBlock.succ[0])
removeLastGoto(falseBlock, falseBlock.succ[0])
return true
goto Lfalse
....
....
goto Lelse
}
Lfalse:
else {
....
....
Lelse:
}
....
....
if block.succ.size != 2
return false
trueBlock = block.succ[0]
falseBlock = block.succ[1]
if trueBlock.succ.size == 1 and trueBlock.succ[0] == falseBlock
ifStmt = new Statement(If)
ifStmt.expr = NegateCondition(block.lastStmt)
ifStmt.trueBlocks = trueBlock
ifStmt.falseBlocks = NULL
removeLastGoto(trueBlock, falseBlock)
goto Lfalse
....
....
Lfalse:
}
More if structuring
goto Lfalse
if(expr2)
goto Lfalse
....
Lfalse:
....
....
}
goto Ltrue
if(expr2)
!expr2) {
goto Lfalse
Ltrue:
....
....
Lfalse:
}
....Limitations
CMP R1,#10
if(R1 == 10 ||
JEQ L1
CMP R1,#20
R1 == 20) {
JNE L2
L1: ....
L2:
}
CMP R1,#10
if(R1 == 10)
JEQ L1
goto L1;
MOV R1, var_2
R1 = var_2; // basic block 2
CMP R1,#20
if(R1 != 20)
JNE L2
goto L2;
L1: ....
L1: ...
L2:
L2: