| Decompiler Design - Creating Statements | 
| 
 | 
Prev: Loop Analysis
With the information we have gathered in the previous two steps (basic blocks and loops) we now have all we need to convert control flow instructions into high-level statements.
This process is called "structuring" in the literature, because we re-create the original program structure from an unstructured sequence of instructions.
Control flow instructions (branches, goto, labels) are converted into the following set of C statements:
do { break; continue; } while;
while() { break; continue; }
for() { break; continue; }
if() { }       if() { } else { }
return
goto label
The final goal is to remove all goto and all labels.
We use a proven technique called "interval analysis". The general approach is to replace a number of basic blocks which are identified to have a known pattern into a single basic block, that is into a statement that has only one entry and only one exit. We'll do this by moving statements from the containing scope (initially there is only one scope, the entire procedure), into the scope(s) of the control statement, effectively creating a tree of basic blocks.
We start by structuring loop statements. The reason is that we want to convert goto instructions into break and continue statements before we attempt to convert conditional branches into if-else statements.
Having the list of loops in a procedure, we first identify the most nested loop (a loop that has no other loops inside). We then replace the header block of each loop with a do-while statement, and we move all blocks belonging to the loop into the do-while statement.
    sort loopSet from innermost loop to outermost loop
	for each loop in loopSet {
   
        doWhile = new Statement(DoWhile)
    doWhile->expr = new Expr(loop->blocks.last->lastCompareInstr)
    doWhile->blocks = loop->blocks
    doWhile->breakBlock = loop->blocks->postDominator
    doWhile->continueBlock = loop->blocks.last
	
    if doWhile->continueBlock->onlyStatement != If or
       doWhile->continueBlock->onlyStatement.destination 
	!= header
        doWhile->continueBlock = NULL
	
    loop->header->lastStatement = doWhile
    StructureBreakContinue(doWhile, 
	doWhile->continueBlock, doWhile->breakBlock)
	}
	
	StructureBreakContinue(Statement stmt, Block contBlock, Block breakBlock)
	{
    switch stmt.type {
	
    case Goto:
        if stmt.destination == contBlock
            stmt.type = 
	Continue
        else if stmt.destination == 
	breakBlock
            stmt.type = 
	Break
        break
	
    case If:
        if stmt.elseBlocks
            for each 
	statement in stmt.elseBlocks
                
	StructureBreakContinue(statement, contBlock, breakBlock)
        
        if stmt.trueBlocks
            for each 
	statement in stmt.trueBlocks
                
	StructureBreakContinue(statement, contBlock, breakBlock)
    }
	}
The above algorithm converts the following sequence:
   ....                             
....
Lheader:
                                    
do {
   ....                                 
....  
   if(expr)                             
if(expr)
      goto Lcontinue                        
continue;
   ....                                 
....
   if(expr)                             
if(expr)
      goto Lbreak                           
break;
   ...                                  
....
Lcontinue:
   if(expr)                         
} while(expr);
      goto Lheader
Lbreak:
   ...
As you can see, 2 goto statements and 2 labels have been removed. Note how we need to visit innermost loops first, and how we only descend the if-else statements but not do-while statements that we already discovered. This is because a break or continue statement only applies to the closest loop, not to any outer loop that may be active at that statement. goto statements that jump out of the innermost loop will remain as gotos in the final output.
Now that we have identified the basic loop statements we can improve the output by trying to eliminate even more gotos and labels, and also by creating more familiar loop constructs. After all, the do-while loop is not used as often as the more common while and for loops. These can now be created by checking a few conditions and rewriting the do-while loops that we just generated.
   if(expr)                         
while(expr) {
      goto Lbreak;
   do {
      ....                              
....
   } while(expr);                   
}
Lbreak:
   ....                             
.... 
Note that the break and continue points have remained the same during the conversion from do-while to while.
We can now go further and convert while loops into for loops by checking if there is an initialization and an increment statement for the same variable:
   v = init_expr
   while(expr comparing v) {       for(v 
= init_expr; expr comparing v; v = incr) {
      ....
Lcontinue:
      v = incr // e.g. v += 1
   }                               
}
   ....
Note that for loops have a different continue point than do-while and while loops, therefore the initial conversion of if-goto to if-continue statements would have failed. But now that we know the continue point for the loop, we can call StructureBreakContinue for the statements inside the for loop, so that gotos to the continue point are converted into continue statements.
Now that we have converted loop statements we can try to eliminating the remaining goto statements. These should be part of if or if-else statements, so we'll try to identify basic block configurations that allow us to convert if-goto sequences into if or if-else sequences.
The basic idea is the same as we used in structuring loops: we remove all blocks that belong to the if-goto sequence and make them children of the containing if-else statement.
StructureIfElse(Block block) 
The above algorithm converts patterns like the following into 
if-else 
statements.
 
   if(expr)                     
if(!expr) { 
Note that we have to negate the condition to enter the true block (alternatively 
we can swap the true and false blocks and keep the same condition). This pattern 
will eliminate 2 gotos and possibly 2 labels. 
A variant of the previous algorithm has no else portions. That is: 
StructureIfs(Block block) 
The above variant takes care of the following case: 
   if(expr)                     
if(!expr) { 
Not all if-goto pairs will be converted by the above algorithms. Consider the 
following pattern: 
    if(expr1) 
The above sequence is indicative of a single if statement with two conditions. 
That must be converted into a single expression concatenating the 2 conditions 
with an && operator: 
    if(!expr1 && !expr2) { 
There is no limit to the number of expressions that can be collapsed into the 
same if statement, so recognizing the above pattern should be repeated as long 
as we can collapse more and more expressions. Similarly, we can recognize 
sequences of expressions that jump to the true block, and concatenate them using 
the || operator: 
    if(expr1)                   
if(expr1  || 
 
The above algorithms and patterns work fairly well for specific instruction 
sequences. 
However their fairly simplistic approach fails miserably in presence of even the 
slightest disturbance in the instruction sequence. In particular the presence of 
complex expressions prevents the collapsing of if sequences. Consider the 
following sequence: 
    MOV  R1, var_1              
R1 = var_1; 
The sequence maps nicely into the if-or pattern. However, having a more complex 
expression sequence will break the mapping: 
    MOV  R1, var_1               
R1 = var_1;     // basic block 1 
This is because the second basic block has multiple statements as opposed to a 
single if statement. Since an expression cannot contain statements, we cannot 
collapse the two compare instructions into a single if expression. 
 
In order to be able to reduce the number of statements, thus opening the door 
for further statement structuring, we need to collapse simple expressions into 
more complex ones. 
Data flow analysis helps in reducing the 
non-control-flow statements, and opens the door to better statement 
structuring.
 
Next: Data flow analysis
    	
   if block.succ.size != 2
       return false
   trueBlock = block.succ[0]
   falseBlock = block.succ[1]
   if trueBlock.succ.size != 1 or falseBlock.succ.size != 1
       return false
   if falseBlock.succ[0] != trueBlock.succ[0]
       return false
   ifStmt = new Statement(If)
   ifStmt.expr = NegateCondition(block.lastStmt)
   ifStmt.trueBlocks = trueBlock
   ifStmt.falseBlocks = falseBlock
   removeLastGoto(trueBlock, trueBlock.succ[0])
   removeLastGoto(falseBlock, falseBlock.succ[0])
   return true  
      goto Lfalse
   ....                             
....
   goto Lelse                   
}
Lfalse:                         
else {
   ....                             
....
Lelse:                          
}
   ....                         
....  
   if block.succ.size != 2
       return false
   trueBlock = block.succ[0]
   falseBlock = block.succ[1]
   if trueBlock.succ.size == 1 and trueBlock.succ[0] == falseBlock
       ifStmt = new Statement(If)
       ifStmt.expr = NegateCondition(block.lastStmt)
       ifStmt.trueBlocks = trueBlock
       ifStmt.falseBlocks = NULL
       removeLastGoto(trueBlock, falseBlock)
  
      goto Lfalse
   ....                             
....
Lfalse:                         
}
More if structuring
        goto Lfalse
    if(expr2)
        goto Lfalse
    ....
Lfalse:
    ....
        ....
    }
        goto Ltrue
    if(expr2)                      
!expr2) {
        goto Lfalse
Ltrue:
    ....                            
....
Lfalse:                         
}
    ....Limitations
    CMP  R1,#10                 
if(R1 == 10 ||
    JEQ  L1
    CMP  R1,#20                    
R1 == 20) {
    JNE  L2
L1: ....
L2:                             
}
    CMP  R1,#10                  
if(R1 == 10)
    JEQ  L1                          
goto L1;
    MOV  R1, var_2               
R1 = var_2;     // basic block 2
    CMP  R1,#20                  
if(R1 != 20)
    JNE  L2                          
goto L2;
L1: ....                     
L1: ...
L2:                          
L2: