NJU Static Program Analysis 04: Data Flow Analysis II
Abstract
- Understand the three data flow analyses:
- reaching definitions
- live variables
- available expressions
- Can tell the differences and similarities of the three data flow analyses
- Understand the iterative algorithm and can tell why it is able to terminate
Notes
The lecture first introduced the live variables analysis. It is defined as:
Live variables analysis tells whether the value of variable v at program point p could be used(and without any redefinition) along some path in CFG starting at p. If so, v is live at p; otherwise, v is dead at p.
Live variables analysis can be used for register allocations. For example at some point all registers are full and we need to use one, then we should favor using(replace) a register with a dead value.
To implement this, we can apply a state compression method similar to that of reaching variables analysis. Additionally, considering the analysis of program point p needs information of the successor points, we‘d better apply the backward analysis method to lower the complexity.
With the basis of reaching variables analysis, we can easily induce that:
The \(use_B\) here consists variables used before redefinition in \(B\), and \(def_B\) consists variables redefined in \(B\). On these basis, we can design a algorithm handling live variables analysis like:
The algorithm is almost the same to that of reaching definition, so that the proof of its definition and correctness is similar.
The lecture then introduced the available expressions. It is defined as:
An expression x op y is available at program point p if
- all paths from the entry to p must pass through the evaluation of x op y
- after the last evaluation of x op y, there is no redefinition of x op y
So, the available expression analysis tells whether at a program point p, we can replace the expression x op y
by the result of its last evaluation, which indicates that available Expressions analysis can be used for detecting global common sub expressions to accelerate the program calculation. Additionally, as its application acquires, available expressions analysis is a king of must-analysis, or a complete but not sound analysis.
In this analysis, we also use state compression techniques. That is to use a vector of bits not standing for variables, but for different expressions. In this way, we can similarly design a transfer function like:
the \(gen_B\) consists the expression generated in \(B\), and \(kill_B\) consists expressions whose operands are the variables redefined in \(B\).
A different thing in the analysis is that, because of the analysis is a complete one, we should handle the control flow in a stricter way:
Instead of using the operator \(\cup\), which brings about possibility and ambiguity, the \(\cap\) here means a more definite logic, as the word all in definition acquires. It seems that, when we our analysis is to find or prevent errors, it should be a sound(over-approximated) one to guarantee the correctness of the program, and when our analysis is to do optimization, it should be a complete(safe-approximated) one to avoid possible problems.
Finally we got the algorithm of available expressions analysis:
In the last analysis(a pun, forgive me), we can make a comparison of the three kinds of data flow analysis.