2. • Compiler front-end: lexical analysis, syntax analysis, semantic analysis
Tasks: understanding the source code, making sure the source code is
written correctly
• Compiler back-end: Intermediate code generation/improvement, and
Machine code generation/improvement
Tasks: translating the program to a semantically the same program (in
a different language).
Design Of a Compiler
2
3. What is Code Optimization?
• Optimization is a program transformation technique, which tries to improve the code that
consume less resources (i.e. CPU, Memory) and deliver high speed.
• In optimization, high-level general programming constructs are replaced by very efficient low-
level programming codes.
A code optimizing process must follow the three rules given below:
1. The output code must not change the meaning of the program in any way.
Should not change the output produced for any input
Should not introduce an error
2. Optimization should increase the speed of the program and if possible, the program should
demand less resources.
3. Optimization should itself be fast and should not delay the overall compiling process.
3
4. • Source Code:
-Algorithms transformation can produce spectacular improvements
• Intermediate Code:
-Compiler can improve loops, procedure calls and address calculations
-Typically only optimizing compilers include this phase
• Target Code:
- Compilers can use registers efficiently
Optimized code’s features:
• Executes faster
• Code size get reduced
• Efficient memory usage
• Yielding better performance
• Reduces the time and space complexity
4
Improvements can be made at various phases:
5. Organization of an optimizing compiler
5
Control
flow
analysis
Data flow
analysis
Transformation
Code optimizer
Intermediate Code
Generator
Target Code Generator
6. • Flow analysis is a fundamental prerequisite for many important types of code improvement.
• Generally control flow analysis precedes data flow analysis.
• Control flow analysis (CFA) represents flow of control usually in form of graphs, CFA
constructs such as
• control flow graph -graphical representation of control flow or computation during the execution.
• Call graph - represents calling relationships between subroutines.
• Data flow analysis (DFA) is the process of asserting and collecting information prior to
program execution about the possible modification, preservation, and use of certain entities
(such as values or attributes of variables) in a computer program.
6
Flow analysis - Organization of an optimizing compiler
7. Basic Blocks
Basic blocks
• Basic blocks are sequences of intermediate code in which flow of control enters at the beginning and leaves at the
end without any halt or possibility of branching except at the end.
Example
• Sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
• Basic blocks are represented as directed acyclic blocks(DAGs), which are in turn represented using the value-
numbering method applied on quadruples
• Optimization is done on these basic blocks
8. A basic block begins in one of the following ways:
• the entry point into the function.
• the target of a branch (can be a label)
• the instruction immediately following a branch or a return
A basic block ends in any of the following ways :
• a jump statement
• a conditional or unconditional branch
• a return statement
9. Basic Block Representation
• A basic block is represented as a record consisting of
1. a count of the number of quadruples in the block
2. a pointer to the leader of the block which is
1. first statement of basic blocks
2. Any statement that is the target of a conditional or unconditional goto is a leader.
3. Any statement that immediately follows a goto or conditional goto statement is a leader.
3. pointers to the predecessors of the block – Previous block of current block
4. pointers to the successors of the block – Next block of current block
• Note that jump statements point to basic blocks and not quadruples so as to make code movement
easy
9
10. Basic Block Example
{
prod =0;
i =1;
do
{
prod =prod+ a[i] * b[i];
i =i+1;
}
while i < = 20
} 10
• Source code for dot product of two vectors a and
b of length 20 prod = 0
i = 1
L1: t1 = 4* i
t2 = a[t1] /*compute a[i] */
t3 = 4* i
t4 = b[t3] /*compute b[i] */
t5 = t2*t4
t6 = prod+t5
prod = t6
t7 = i+1
i = t7
if i<=20 goto L1
• Three Address code of source code
11. CONTROL FLOW GRAPH (CFG)
• A graph representation of three-address statements, called a flow graph.
• The nodes of the CFG are basic blocks
• Nodes in the flow graph represent computations, and the edges represent the flow of control.
• Graph showing the different possible paths of program flow.
• CFG is constructed by dividing the code into basic blocks
• Flow graphs show control flow among basic blocks.
• It is useful for understanding code-generation algorithms, even if the graph is not explicitly
constructed by a code-generation algorithm.
12. 12
L1: t1 = 4* i
t2 = a[t1] /*compute a[i] */
t3 = 4* i
t4 = b[t3] /*compute b[i] */
t5 = t2*t4
t6 = prod+t5
prod = t6
t7 = i+1
i = t7
if i<=20 goto L1
prod = 0
i = 1
Block B1
Block B2
• One node is distinguished as the initial node (B1).
• There is a directed edge B1 → B2, if B2 can
immediately follow B1 in some execution sequence;
i.e.,
• There is a conditional or unconditional jump
from the last statement of B1 to the first
statement of B2, or
• B2 immediately follows B1. so there is an edge
from B1 (last statement) to B2 (first
statement), and B1 does not end in an
unconditional jump.
• B1 is the predecessor of B2, and B2 is a
successor of B1.
CONTROL FLOW GRAPH (CFG)
13. Global Data Flow Analysis
• Collect information about the whole program.
• Distribute the information to each block in the flow graph.
• Data flow information: Information collected by data flow analysis.
• Data flow equations: A set of equations solved by data flow analysis to gather data flow information.
• Data flow analysis should never tell us that a transformation is safe.
• When doing data flow analysis we must be
• Conservative
• Do not consider information that may not preserve the behavior of the program
• Aggressive
• Try to collect information that is as exact as possible, so we can get the greatest benefit from
the optimizations.
13
14. Global Iterative Data Flow Analysis
• Global:
• Performed on the flow graph
• Goal is to collect information at the beginning and end of each basic block
• Iterative:
• Construct data flow equations that describe how information flows through each basic block and solve
them by iteratively converging on a solution.
• Components of data flow equations
• Sets containing collected information
• in set: information coming into the BB from outside (following flow of data)
• gen set: information generated/collected within the BB
• kill set: information that, due to action within the BB, will affect what has been collected outside
the BB
• out set: information leaving the BB
• Functions (operations on these sets)
• Transfer functions describe how information changes as it flows through a basic block
• Meet functions describe how information from multiple paths is combined.
14
15. • A Code optimizer locates between the intermediate code generator and code
generator.
– Can do control flow analysis.
– Can do data flow analysis.
– Does transformations to improve the intermediate code.
15
The Golden Rules of Optimization
The 80/20 Rule
In general, 80% percent of a program’s execution time is spent executing 20%
of the code
16. Major Classifications of Code Optimization techniques
Machine Dependent Optimization
Machine dependent optimizations are based on register allocation and utilization of special machine-instruction sequences.
It involves CPU registers and may have absolute memory references rather than relative references.
Peephole optimization
Register Allocation and Instruction Selection (Special Hardware features)
Machine Independent Optimization
It is a program transformation that improve the target code without taking into consideration any properties (CPU registers
and/or absolute memory locations) of the target machine.
Local optimizations
Global Optimizations – Data flow analysis
Inter-procedural optimizations
Loop optimization
16
17. Scope Of Optimization
• Peephole analysis
• Within one or a few instructions
• Local analysis
• Within a basic block
• Global analysis
• Entire procedure or within a certain scope
• Inter-procedural analysis
• Beyond a procedure, consider the entire program
17
18. Classification of optimization
There are primarily 3 types of optimizations:
(1) Local optimization
• Apply to a basic block in isolation
(2) Global optimization
• Apply across basic blocks
(3) peep-hole optimization
• Apply across boundaries
Most compilers do (1), many do (2) and very few do (3)
18
19. Local optimization
• Optimization performed within a basic block.
• This is simplest form of optimizations
• No need to analyze the whole procedure body.
• Just analyze the basic blocks of the procedure.
The local optimization techniques include:
• Constant Folding
• Constant Propagation
• Algebraic Simplification
• Operator Strength Reduction
• Copy Propagation
• Dead Code Elimination
19
20. Constant Folding
• Evaluate constant expressions at compile time.
20
c:= 1 + 3 c:= 4
!true false
Example:
In the code fragment below, the expression (3 + 5) can be evaluated at compile time and replaced with the constant 8.
int f ()
{
return 3 + 5;
}
Below is the code fragment after constant folding.
int f ()
{
return 8;
}
Expressions with constant operands can be
evaluated at compile time, thus improving run-time
performance and reducing code size by avoiding
evaluation at run-time.
21. Constant Propagation
• Variables that have constant value, e.g. b := 3
• Later uses of b can be replaced by the constant, if no change of b in between.
21
Example:
• In the code fragment below, the value of x can be propagated to the use of x.
x = 3;
y = x + 4;
• Below is the code fragment after constant propagation and constant folding.
x = 3;
y = 7;
b := 3
c := 1 + b
d := b + c
b := 3
c := 1 + 3
d := 3 + c
Constants assigned to a variable can be propagated through the flow graph and substituted at the use of the
variable.
22. Algebraic Simplification
• Use algebraic properties to simplify expressions
• Some expressions can be simplified by replacing them with an equivalent expression that is more efficient.
22
-(-i) i
Example:
The code fragment below contains expressions that can be simplified.
void f (int i)
{
a[0] = i + 0;
a[1] = i * 0;
a[2] = i - i;
a[3] = 1 + i + 1;
}
Below is the code fragment after expression simplification.
void f (int i)
{
a[0] = i;
a[1] = 0;
a[2] = 0;
a[3] = 2 + i;
}
23. Operator Strength Reduction
• Replace expensive operations with simpler ones
• Typical cases of strength reduction occurs in address calculation of array references.
• Example: Multiplications replaced by additions ,
23
y := x * 2 y := x + x
Replacement of an operator with a less costly one.
temp = 5;
for i=1 to 10 for i=1 to 10
{ {
… …
x = i * 5 x = temp
… …
temp = temp + 5
} }
24. Copy Propagation
• Given an assignment x = y, replace later uses of x with uses of y, provided there are no
intervening assignments to x or y.
• Example
x[i] = a; x[i] = a;
sum = x[i] + a; sum = a + a;
• Example
x := y; s := y * f(y)
s := x * f(x)
24
After y is assigned to x, use y to replace x till x is assigned again reduce the copying.
If y is reassigned in between, then this action cannot be performed.
25. Dead Code Elimination
• Dead Code are portion of the program which will not be executed in any path of the program. It can be removed
• Examples:
• No control flows into a basic block
• A variable is dead at a point (i.e) its value is not used anywhere in the program
• An assignment is dead (i.e) assignment assigns a value to a dead variable
• Ineffective statements:
x := y + 1 (x is immediately redefined in 3rd line without use, therefore eliminate)
y := 5 y := 5
x := 2 * z x := 2 * z
25
• A variable is dead if it is never used after last definition
• Eliminate assignments to dead variables
• Need to do data flow analysis to find dead variables
26. • Remove code never reached
26
if (false)
{a := 5}
if (false)
{}
Dead Code Elimination
27. Other types of Local Optimizations
• The following two optimizations can be applied only on DAG or tree representation of
basic block
• Reordering statements that do not depend on one another
• Reordering computations using algebraic laws
Optimization of Basic Blocks
• Many structure preserving transformations can be implemented by construction of DAGs of
basic blocks
27
29. DAG representation of Basic Block (BB)
• Leaves are labeled with unique identifier (variable name or constant)
• Interior nodes are labeled by an operator symbol
• Nodes optionally have a list of labels (identifiers)
• Edges relates operands to the operator (interior nodes are operator)
• Interior node represents computed value
• Identifier in the label are deemed to hold the value
29
30. Example: DAG for BB
30
t1 := 4 * i
t1
*
i
4
t1 := 4 * i
t3 := 4 * i
t2 := t1 + t3
*
i
4
+
t1, t3
t2
if (i <= 20)goto L1
<=
i 20
(L1)
31. Construction of DAGs for BB
• Input: Basic block, B
• Output: A DAG for B containing the following information:
1) A label for each node
2) For leaves the labels are identifiers or constants
3) For interior nodes the labels are operators
Data structure and functions:
• Node:
1) Label: label of the node
2) Left: pointer to the left child node
3) Right: pointer to the right child node
4) List: list of additional labels (empty for leaves)
• Node (id): returns the most recent node created for id. else return undef
• Create(id,l,r): create a node with label id with l as left child and r as right
child. l and r are optional parameters. 31
32. Algorithm:
For each 3AC, A in BB
if A is any of the following forms:
1. x = y op z
2. x = op y
3. x = y
1. if ((ny = node(y)) == undef)
ny = Create (y);
if (A = = type 1) and ((nz = node(z)) == undef)
nz = Create(z);
32
Construction of DAGs for BB
2. If (A = = type 1) //x = y op z
Find a node labelled ‘op’ with left and right as ny and nz respectively [determination of common sub-expression]
If (not found)
n = Create (op, ny, nz);
If (A = = type 2) //x = op y
Find a node labelled ‘op’ with a single child as ny
If (not found)
n = Create (op, ny);
If (A = = type 3) n = Node (y); //x = y
2. Remove x from Node(x).list
Add x in n.list
Node(x) = n;
36. Example: DAG construction from BB
36
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
t4 := b [ t3 ]
*
i
4
t1, t3
[]
a
t2
[]
b
t4
37. Example: DAG construction from BB
37
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
t4 := b [ t3 ]
t5 := t2 + t4
*
i
4
t1, t3
[]
a
t2
[]
b
t4
t5
+
38. Example: DAG construction from BB
38
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
t4 := b [ t3 ]
t5 := t2 + t4
i := t5
*
i
4
t1, t3
[]
a
t2
[]
b
t4
t5,i
+
• Observations:
• A leaf node for the initial value of an id
• A node n for each statement s
• The children of node n are the last definition (prior to s) of the operands of n
39. Optimization of Basic Blocks
• Common sub-expression elimination: by construction of DAG
• Note: for common sub-expression elimination, we are actually targeting for expressions that compute the
same value.
39
a = b + c
b = b – d
c = c + d
e = b + c
Common expressions
(But do not generate the same result)
• DAG representation identifies expressions that yield the same result
a = b + c
b = b – d
c = c + d
e = b + c
b0 c0 d0
+
+
+ -
a b c
e
40. • Dead code elimination: Code generation from DAG eliminates dead code.
40
a := b + c
b := a – d
d := a – d
c := d + c
b is not live
c
a := b + c
d := a - d
c := d + c
b0 c0
d0
+
-
+
a
b, d
×
Optimization of Basic Blocks
41. Global Optimization
• Optimization across basic blocks within a procedure/function
• Could be restricted to a smaller scope, Example: a loop
• Data-flow analysis is done to perform optimization across basic blocks
• Each basic block is a node in the flow graph of the program.
• These optimizations can be extended to an entire control - flow graph
• Most of compiler implement global optimizations with well founded theory and practical gains
41
42. Interprocedural Optimizations
• Spans multiple procedures, files
• Not as popular as global optimizations
• No single theory applicable to multiple scenarios
• Time consuming
42
43. • Optimization technique that operates on the one or few instructions at a time.
• Performs machine dependent improvements
• Peeps into a single or sequence of two to three instructions (peephole) and replaces it by most
efficient alternative (shorter or faster) instructions.
• Peephole is a small moving window on the target systems
• Characteristics of peep-hole optimizations:
Redundant-instruction (loads and stores)elimination
Flow-of-control optimizations - - Elimination of multiple jumps
Elimination of unreachable code
Algebraic simplifications
Reducing operator strength
Use of machine idioms 43
Peep-hole optimization
44. Eliminating Redundant Loads and Stores
• If the target code contains the instruction sequence:
• Instruction 2 can always be removed if it does not have a label.
• If it is labeled, there is no guarantee that step 1 will always be executed
before step 2.
44
MOV R, a
MOV a, R
goto L1
...
...
MOV R, a
L1: MOV a, R
45. Eliminating Multiple Jumps
• If we have jumps to other jumps, then the unnecessary jumps can be
eliminated in either intermediate code or the target code.
• Example: If we have a jump sequence:
45
if a < b goto L1
...
L1: goto L2
can be replaced by:
if a < b goto L2
...
L1: goto L2
If there are no jumps to L1, then it may be possible to eliminate the
statement, provided it is preceded by an unconditional jump.
46. Eliminating Unreachable Code
• An unlabeled instruction that immediately follows an unconditional jump can possibly be
removed, and this operation can be repeated in order to eliminate a sequence of instructions.
46
int debug = 0
if (debug) {
print debugging information
}
this may be translated as
if debug = = 1 goto L1
goto L2
L1: print debugging information
L2:
The statements that print the debugging information are
unreachable and can be eliminated
if debug != 1 goto L2
print debugging information
L2:
47. Strength reduction
• Example:
• Replace X^2 computations by X*X
• Replace multiplication by left shift
• Replace division by right shift
47
48. Using Machine Idioms
• The target machine may have hardware instructions to implement certain specific operations
efficiently.
• Detecting situations that permit the use of these instructions can reduce execution time
significantly.
• For example, some machines have auto-increment and auto-decrement addressing modes.
• Using these addressing modes can greatly improve the quality of the code when pushing
or popping a stack.
• These modes can also be used for implementing statements like a = a + 1.
48
replace Add #1,R
by Inc R
49. Other types of Code optimization
1. Function preserving Transformations
Common Sub-expression Removal
Copy propagation,
Dead-code elimination,
Constant folding
2. Loop Optimization
3. Two types of basic block optimizations
I. Structure-Preserving Transformations
• Common sub-expression elimination
• Dead code elimination
• Renaming of temporary variables
• Interchange of two independent adjacent statements.
II. Algebraic Transformations
49
50. Common Sub expression elimination
• Common Sub expression elimination is a optimization that searches for instances of
identical expressions (i.e. they all evaluate the same value), and
• Analyses whether it is worthwhile replacing with a single variable holding the computed
value.
50
Identify common sub-expression present in different expression, compute once, and use the result in all the places.
Example:
a := b * c temp := b * c
… a := temp
… …
x := b * c + 5 x := temp + 5
51. Common Sub-expression elimination
• Common sub-expression elimination
• Example 1:
a := b + c a := b + c
c := b + c c := a
d := b + c d := a
• Example 2: in array index calculations
• c[i+1] := a[i+1] + b[i+1]
• During address computation, i+1 should be reused
• Not visible in high level code, but in intermediate code
53. Dead code Optimization:
• Dead Code elimination removes code that does not affect a program.
• Removing such code has two benefits.
• It shrinks program size.
• It avoids the executing irrelevant operations, which reduces its running time.
• Two types of Dead Code elimination
• Unreachable Code
• Redundant statement
53
54. Unreachable Code - Dead code Optimization
• In Computer Programming, Unreachable Code or dead code is code that exists in
the source code of a program but can never be executed.
Program Code
if (a>b)
m=a
elseif (a<b)
m=b
elseif (a==b)
m=0
else
m=-1
Optimized Code
if (a>b) m=a
elseif (a<b) m=b
else
m=0
54
55. Redundant Code - Dead code Optimization
• Redundant Code is code that is executed but has no effect on the output
from a program
main()
{
int a, b, c, r;
a=5;
b=6;
c=a + b;
r=2; r++;
printf(“ %d” ,c);
}
Adding time & space complexity
55
56. Loop optimization
• Loop optimization plays an important role in improving the performance of the
source code by reducing overheads associated with executing loops.
• The inner loops where programs tend to spend the bulk of their time.
• The running time of a program may be improved if we decrease the number of instructions
in an inner loop, even if we increase the amount of code outside that loop.
• Loop Optimization techniques:
• Code Motion - moves code outside a loop
• Induction variables elimination - replace variables from inner loop
• Reduction in strength - replaces expensive operation by a cheaper one, such as a
multiplication by an addition
56
57. Code Motion - Loop Optimization
• Example - Computation can be moved to outside of the loop
i = 1
s= 0
do{
s= s +i
a =5
i = i + 1
}
while (i < =n)
i = 1
s= 0
a =5
do{
s= s + i
i = i + 1
}
while (i < =n)
Bringing a=5 outside the do while loop, is called code motion.
57
58. • Example
for (i=0; i<n; i++)
a[i] = a[i] + x/y;
• Three address code
for (i=0; i<n; i++)
{
c = x/y;
a[i] = a[i] + c;
}
c = x/y;
for (i=0; i<n; i++)
a[i] = a[i] + c; 58
Code Motion - Loop Optimization
59. Code hoisting - Loop Optimization
• Code Space reduction: Similar to common sub-expression elimination but with the
objective to reduce code size.
59
Example: Code hoisting
temp : = x ** 2
if (a< b) then if (a< b) then
z := x ** 2 z := temp
else else
y := x ** 2 + 10 y := temp + 10
“x ** 2“ is computed once in both cases, but the code size in the second case reduces.
60. Induction variable elimination
• If there are multiple induction variables in a loop, can eliminate the ones which are used only in
the test condition
60
The code fragment below has three induction
variables (i1, i2, and i3) that can be replaced
with one induction variable
int a[SIZE];
int b[SIZE];
void f (void)
{
int i1, i2, i3;
for (i1 = 0, i2 = 0, i3 = 0; i1 < SIZE; i1++)
a[i2++] = b[i3++];
return;
}
The code fragment below shows the loop
after induction variable elimination.
int a[SIZE];
int b[SIZE];
void f (void)
{
int i1;
for (i1 = 0; i1 < SIZE; i1++)
a[i1] = b[i1];
return;
}
61. • Example
s := 0;
for (i=0; i<n; i++)
{
s := 4 * i;
…
}
61
Induction variable elimination
s := 0;
e := 4*n;
while (s < e)
{
s := s + 4;
}
// i is not referenced in loop
62. Loop Fusion - Loop Optimization
• Example
for (i=0; i<n; i++) {
A[i] = B[i] + 1
}
for (i=0; i<n; i++) {
C[i] = A[i] / 2
}
for (i=0; i<n; i++) {
D[i] = 1 / C[i+1]
}
Before Loop Fusion
for (i=0; i<n; i++) {
A[i] = B[i] + 1
C[i] = A[i] / 2
D[i] = 1 / C[i+1]
}
Is this correct?
Actually, cannot fuse the third loop
for (i=0; i<n; i++) {
A[i] = B[i] + 1
C[i] = A[i] / 2
}
for (i=0; i<n; i++) {
D[i] = 1 / C[i+1]
}
63. Loop unrolling or Loop collapsing - Loop Optimization
• Execute loop body multiple times at each iteration
• Try to get rid of the conditional branches, if possible
• Allow optimization to cross multiple iterations of the loop
• Especially for parallel instruction execution
63
64. Loop unrolling or Loop collapsing - Loop Optimization
64
Example:
In the code fragment below, the double-nested loop on i and j can be collapsed into a single-nested loop.
int a[100][300];
for (i = 0; i < 300; i++)
for (j = 0; j < 100; j++)
a[j][i] = 0;
Here is the code fragment after the loop has been collapsed.
int a[100][300];
int *p = &a[0][0];
for (i = 0; i < 30000; i++)
*p++ = 0;
65. Renaming temporary variables
• A statement t : = b + c ( t is a temporary ) can be changed to u : = b + c (u is a new
temporary) and
• All uses of this instance of t can be changed to u without changing the value of the basic
block.
• Such a block is called a normal-form block.
65
Interchange of two independent adjacent statements
• Suppose a block has the following two adjacent statements:
t1 : = b + c
t2 : = x + y
• We can interchange the two statements without affecting the value of the block if and only
if neither x nor y is t1 and neither b nor c is t2.
66. Reference
• A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman, Compilers Principles,
Techniques and Tools, Pearson Edition, 2013.
P. Kuppusamy - Lexical Analyzer