Download Pointer Analysis as a System of Linear Equations.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sieve of Eratosthenes wikipedia , lookup

Pattern recognition wikipedia , lookup

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Computational fluid dynamics wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Computational electromagnetics wikipedia , lookup

Transcript
Points-to Analysis as a
System of Linear Equations
Rupesh Nasre.
Computer Science and Automation
Indian Institute of Science
Advisor: Prof. R. Govindarajan
Feb 22, 2010
What is Pointer Analysis?
a = &x;
b = a;
if (b == *p) {
a points to x.
a and b are aliases.
Is this condition always satisfied?
…
} else {
…
}
Pointer Analysis is a mechanism to statically
find out run-time values of a pointer.
Why Pointer Analysis?

For Parallelization.


fun(p) || fun(q);
For Optimization.

a = p + 2;

b = q + 2;

For Bug-Finding.

For Program Understanding.

...
Clients of
Pointer Analysis.
Placement of Pointer Analysis.
Improved runtime.
Parallelizing compiler.
Lock synchronizer.
Memory leak detector.
Secure code.
Data flow analyzer.
Pointer Analysis.
String vulnerability finder.
Better compile time.
Affine expression analyzer.
Type analyzer.
Program slicer.
Better debugging.
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Normalized Input.
p = &q
address-of
p=q
copy
p = *q
load
*p = q
store
p
q
Why as a Linear System?
•
Scalability.

•
Code sizes going into billions.
Scalability.

Analyses trade off at least one of
i. memory requirement,
ii. analysis time,
iii. precision.
•
Scalability.

Linear algebra is a mature topic.
Outline.

Introduction.

First-cut approach.

Prime-factorization approach.

Evaluation.
First-cut Approach:
Transformations

p = &q
p=q–1

p=q
p=q

p = *q
p=q+1

*p = q
p+1=q
Each address-taken variable (&v) would be assigned a unique value.
First-cut Approach.
a = &x;
a=x-1
x=r
p = &a;
p=a-1
a=r-1
b = *p;
b=p+1
b=r-1
c = b;
c=b
c=r–1
Solve.
a, b, c point to x.
p points to a.
p=r-2
a points to x.
First-cut Approach.
a = &x;
a=x-1
x=r
p = &a;
p=a-1
a=r-1
b = *p;
b=p+1
b=r-1
c = b;
c=b
c=r–1
Solve.
a, b, c point to x.
p points to a.
p=r-2
b points to x.
First-cut Approach.
a = &x;
a=x-1
x=r
p = &a;
p=a-1
a=r-1
c points to x.
b = *p;
b=p+1
b=r-1
c = b;
c=b
c=r–1
Solve.
a, b, c point to x.
p points to a.
p=r-2
First-cut Approach.
a = &x;
a=x-1
x=r
p = &a;
p=a-1
a=r-1
b = *p;
b=p+1
b=r-1
c = b;
Solve.
a, b, c point to x.
p points to a.
c=b
c=r–1
p=r-2
p points to a.
First-cut Approach.
a = &x;
a=x-1
x=r
p = &a;
p=a-1
a=r-1
a, b, c point to x.
p points to a.
b = *p;
c = b;
Solve.
b=p+1
b=r-1
c=b
c=r–1
p=r-2
a, b, c point to x.
Imprecise analysis..
p points to a.
p points to b.
p points to c.
Issues with First-cut Approach.
Dereferencing.


Semantically different.
a = &x versus *a = x.
a = &x
*a = x
a = x-1
a+1 = x
Mathematically same.
Issues with First-cut Approach.
Dereferencing.


a = &x versus *a = x.
Multiple assignments.


a = &x, a = &y;
a = &x;
a = &y;
a = x-1;
a = y-1;
No solution.
Issues with First-cut Approach.
•
Dereferencing.

•
Multiple assignments.

•
a = &x versus *a = x.
a = &x, a = &y;
Cyclic assignments.

a = &a;
a = &a;
a = a-1
No solution.
Issues with First-cut Approach.
•
Dereferencing.

•
Multiple assignments.

•
a = &x, a = &y;
Cyclic assignments.

•
a = &x versus *a = x.
a = &a;
Symmetry of assignment.

a = b implies b = a.
Outline.

Introduction.

First-cut approach.

Prime-factorization approach.

Evaluation.
Important Ideas.

Address of a variable as a prime number.

Points-to set as a multiplication of primes.

Variable renaming to avoid inconsistency.
Prime-factorization Approach:
Transformations

p = &q
pi * (p = prime(&q))

p=q
pi * (p = q)

p = *q
pi * (p = q + 1)

*p = q
handled separately
Each address-taken variable (&v) would be assigned a unique
prime number.
Points-to Information Lattice.
3*5*7*11*…
3*5*7 3*5*11 3*7*11
15 21 33
3
5
35
7
55
5*7*11…
Precision
increases
77…
11…
1
We start with larger primes to avoid composition gap problem.
Algorithm Outline.
do {
equations = Linearize(constraints);
solution = LinSolve(equations);
points-to = Interpret(solution);
constraints += AddConstraints(store-constraints, points-to);
} while points-to information changes;
Example.
a = &x;
a = a0*17
a = 17
a = 17
p = &a;
p = p0*101
p = 101
p = 101
b = *p;
b = b0*(p+1)
b = 102
b = 17
c = b;
c = c0*b
c = 102
c = 17
&x = 17
&a = 101
a0 = 1
b0 = 1
c0 = 1
p0 = 1
102 => 1 + 101 => 1 dereference on 101 => 1 dereference on &a => a => 17.
Solution Properties.
•
Integrality.
–
•
Feasibility.
–
•
Only addition and multiplication over integers.
No negative weight cycle.
Uniqueness.
–
Each variable is defined only once.
Soundness.
If &x = 7, &y = 11 and p points to x and y, then p is a multiple of 77.




Base: p points to x and y by direct assignment.
Induction: p points to x and y due to an indirect
assignment (copy, load, store).
Prove that all indirect assignments are safe.
Argument: Multiplication moves the dataflow fact
upwards in the lattice.
Assumption: No problem due to composition gaps.
p1 + k1 is not misinterpreted as p2 + k2.
The assumption can be enforced by careful offline selection of
primes.
Precision.
If &x = 7, &y = 11 and p is a multiple of 77, then p points to x and y.
•
Argument: Prime factorization is unique.
•
Thus, 77 can be decomposed only as 7*11.
•
Prove that none of the address-of, copy, load, store
statements add extra primes into the composition.
Assumption: No problem due to composition gaps.
p1 + k1 is not misinterpreted as p2 + k2.
The assumption can be enforced by careful offline selection of primes.
Properties.
•
If the value of a pointer p is a prime number, then it defines a
must-point-to relation, else it is a may-point-to relation.
•
If the value of p is 1, then p is unused.
•
If pointers p1 and p2 have the same value, then p1 and p2
are pointer equivalent.
•
Variables x and y are location equivalent when &x dividing
the value of pointer p implies &x*&y also divide the value.
•
Pointers p1 and p2 are aliases if gcd(p1, p2) != 1.
Outline.

Introduction.

First-cut approach.

Modified approach.

Evaluation.
Evaluation.
Benchmarks: SPEC 2000, httpd, sendmail.
Configuration: Intel Xeon, 2 Ghz clock, 4MB L2 cache, 3GB RAM.
Analysis: Context-sensitive, Flow-insensitive.
Analysis Time (seconds).
Benchmark
gcc
perlbmk
vortex
eon
gap
parser
vpr
crafty
httpd
sendmail
mesa
ammp
twolf
gzip
equake
art
bzip2
mcf
average
anders-cs
bloom-cs
linear-cs
OOM
OOM
OOM
231.17
144.18
55.36
29.7
20.47
17.45
5.96
1.47
1.12
0.6
0.35
0.22
0.17
0.15
0.11
—
10237.7
2632.04
1998.5
1241.6
152.1
145.78
88.83
46.9
52.79
25.35
10.04
15.19
5.13
1.81
1.1
2.4
1.35
5.04
925.76
196.62
101.69
68.32
106.47
89.53
55.22
47.82
45.79
76.5
84.76
58.25
19.59
23.96
2.1
0.92
1.26
1.62
3.4
54.66
Memory (MB).
Benchmark
gcc
perlbmk
vortex
eon
gap
parser
vpr
crafty
httpd
sendmail
mesa
ammp
twolf
gzip
equake
art
bzip2
mcf
average
anders-cs
bloom-cs
linear-cs
OOM
113577
68492
OOM
54008
29864
OOM
23486
18420
385284
87814
38908
97863
31786
22784
121588
16201
14016
50210
8901
10612
15986
4095
16888
225513
48036
27108
197383
49455
27940
8261
20702
18680
5844
5746
9976
1594
12656
15920
1447
1205
11868
161
1494
12992
42
637
9756
519
878
10244
220
1413
8336
—
925.76
54.66
Summary.
•
We proposed a novel representation of pointsto information using prime factorization.
•
We solved pointer analysis as a system of
linear equations.
•
We empirically showed that it is competitive to
the state-of-the-art algorithms.
Points-to Analysis as a
System of Linear Equations
Rupesh Nasre.
[email protected]
Computer Science and Automation
Indian Institute of Science
Advisor: Prof. R. Govindarajan
Feb 22, 2010
Our Contributions.
•
Ordering points-to statements in an
intelligent way to improve the analysis time.
•
Dynamic partitioning of points-to statements
for a prioritized points-to analysis.
•
Probabilistic points-to analysis using bloom
filters.

Points-to analysis as a set of linear
equations.
Normalized Input.
p
q
p
q
p = &q
address-of
p=q
copy
p
q
p = *q
load
p
q
*p = q
store
p
p
p
q
p
q
q
q