Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010 What is Pointer Analysis? a = &x; b = a; if (b == *p) { a points to x. a and b are aliases. Is this condition always satisfied? … } else { … } Pointer Analysis is a mechanism to statically find out run-time values of a pointer. Why Pointer Analysis? For Parallelization. fun(p) || fun(q); For Optimization. a = p + 2; b = q + 2; For Bug-Finding. For Program Understanding. ... Clients of Pointer Analysis. Placement of Pointer Analysis. Improved runtime. Parallelizing compiler. Lock synchronizer. Memory leak detector. Secure code. Data flow analyzer. Pointer Analysis. String vulnerability finder. Better compile time. Affine expression analyzer. Type analyzer. Program slicer. Better debugging. Normalized Input. p = &q address-of p=q copy p = *q load *p = q store Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Normalized Input. p = &q address-of p=q copy p = *q load *p = q store p q Why as a Linear System? • Scalability. • Code sizes going into billions. Scalability. Analyses trade off at least one of i. memory requirement, ii. analysis time, iii. precision. • Scalability. Linear algebra is a mature topic. Outline. Introduction. First-cut approach. Prime-factorization approach. Evaluation. First-cut Approach: Transformations p = &q p=q–1 p=q p=q p = *q p=q+1 *p = q p+1=q Each address-taken variable (&v) would be assigned a unique value. First-cut Approach. a = &x; a=x-1 x=r p = &a; p=a-1 a=r-1 b = *p; b=p+1 b=r-1 c = b; c=b c=r–1 Solve. a, b, c point to x. p points to a. p=r-2 a points to x. First-cut Approach. a = &x; a=x-1 x=r p = &a; p=a-1 a=r-1 b = *p; b=p+1 b=r-1 c = b; c=b c=r–1 Solve. a, b, c point to x. p points to a. p=r-2 b points to x. First-cut Approach. a = &x; a=x-1 x=r p = &a; p=a-1 a=r-1 c points to x. b = *p; b=p+1 b=r-1 c = b; c=b c=r–1 Solve. a, b, c point to x. p points to a. p=r-2 First-cut Approach. a = &x; a=x-1 x=r p = &a; p=a-1 a=r-1 b = *p; b=p+1 b=r-1 c = b; Solve. a, b, c point to x. p points to a. c=b c=r–1 p=r-2 p points to a. First-cut Approach. a = &x; a=x-1 x=r p = &a; p=a-1 a=r-1 a, b, c point to x. p points to a. b = *p; c = b; Solve. b=p+1 b=r-1 c=b c=r–1 p=r-2 a, b, c point to x. Imprecise analysis.. p points to a. p points to b. p points to c. Issues with First-cut Approach. Dereferencing. Semantically different. a = &x versus *a = x. a = &x *a = x a = x-1 a+1 = x Mathematically same. Issues with First-cut Approach. Dereferencing. a = &x versus *a = x. Multiple assignments. a = &x, a = &y; a = &x; a = &y; a = x-1; a = y-1; No solution. Issues with First-cut Approach. • Dereferencing. • Multiple assignments. • a = &x versus *a = x. a = &x, a = &y; Cyclic assignments. a = &a; a = &a; a = a-1 No solution. Issues with First-cut Approach. • Dereferencing. • Multiple assignments. • a = &x, a = &y; Cyclic assignments. • a = &x versus *a = x. a = &a; Symmetry of assignment. a = b implies b = a. Outline. Introduction. First-cut approach. Prime-factorization approach. Evaluation. Important Ideas. Address of a variable as a prime number. Points-to set as a multiplication of primes. Variable renaming to avoid inconsistency. Prime-factorization Approach: Transformations p = &q pi * (p = prime(&q)) p=q pi * (p = q) p = *q pi * (p = q + 1) *p = q handled separately Each address-taken variable (&v) would be assigned a unique prime number. Points-to Information Lattice. 3*5*7*11*… 3*5*7 3*5*11 3*7*11 15 21 33 3 5 35 7 55 5*7*11… Precision increases 77… 11… 1 We start with larger primes to avoid composition gap problem. Algorithm Outline. do { equations = Linearize(constraints); solution = LinSolve(equations); points-to = Interpret(solution); constraints += AddConstraints(store-constraints, points-to); } while points-to information changes; Example. a = &x; a = a0*17 a = 17 a = 17 p = &a; p = p0*101 p = 101 p = 101 b = *p; b = b0*(p+1) b = 102 b = 17 c = b; c = c0*b c = 102 c = 17 &x = 17 &a = 101 a0 = 1 b0 = 1 c0 = 1 p0 = 1 102 => 1 + 101 => 1 dereference on 101 => 1 dereference on &a => a => 17. Solution Properties. • Integrality. – • Feasibility. – • Only addition and multiplication over integers. No negative weight cycle. Uniqueness. – Each variable is defined only once. Soundness. If &x = 7, &y = 11 and p points to x and y, then p is a multiple of 77. Base: p points to x and y by direct assignment. Induction: p points to x and y due to an indirect assignment (copy, load, store). Prove that all indirect assignments are safe. Argument: Multiplication moves the dataflow fact upwards in the lattice. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes. Precision. If &x = 7, &y = 11 and p is a multiple of 77, then p points to x and y. • Argument: Prime factorization is unique. • Thus, 77 can be decomposed only as 7*11. • Prove that none of the address-of, copy, load, store statements add extra primes into the composition. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes. Properties. • If the value of a pointer p is a prime number, then it defines a must-point-to relation, else it is a may-point-to relation. • If the value of p is 1, then p is unused. • If pointers p1 and p2 have the same value, then p1 and p2 are pointer equivalent. • Variables x and y are location equivalent when &x dividing the value of pointer p implies &x*&y also divide the value. • Pointers p1 and p2 are aliases if gcd(p1, p2) != 1. Outline. Introduction. First-cut approach. Modified approach. Evaluation. Evaluation. Benchmarks: SPEC 2000, httpd, sendmail. Configuration: Intel Xeon, 2 Ghz clock, 4MB L2 cache, 3GB RAM. Analysis: Context-sensitive, Flow-insensitive. Analysis Time (seconds). Benchmark gcc perlbmk vortex eon gap parser vpr crafty httpd sendmail mesa ammp twolf gzip equake art bzip2 mcf average anders-cs bloom-cs linear-cs OOM OOM OOM 231.17 144.18 55.36 29.7 20.47 17.45 5.96 1.47 1.12 0.6 0.35 0.22 0.17 0.15 0.11 — 10237.7 2632.04 1998.5 1241.6 152.1 145.78 88.83 46.9 52.79 25.35 10.04 15.19 5.13 1.81 1.1 2.4 1.35 5.04 925.76 196.62 101.69 68.32 106.47 89.53 55.22 47.82 45.79 76.5 84.76 58.25 19.59 23.96 2.1 0.92 1.26 1.62 3.4 54.66 Memory (MB). Benchmark gcc perlbmk vortex eon gap parser vpr crafty httpd sendmail mesa ammp twolf gzip equake art bzip2 mcf average anders-cs bloom-cs linear-cs OOM 113577 68492 OOM 54008 29864 OOM 23486 18420 385284 87814 38908 97863 31786 22784 121588 16201 14016 50210 8901 10612 15986 4095 16888 225513 48036 27108 197383 49455 27940 8261 20702 18680 5844 5746 9976 1594 12656 15920 1447 1205 11868 161 1494 12992 42 637 9756 519 878 10244 220 1413 8336 — 925.76 54.66 Summary. • We proposed a novel representation of pointsto information using prime factorization. • We solved pointer analysis as a system of linear equations. • We empirically showed that it is competitive to the state-of-the-art algorithms. Points-to Analysis as a System of Linear Equations Rupesh Nasre. [email protected] Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010 Our Contributions. • Ordering points-to statements in an intelligent way to improve the analysis time. • Dynamic partitioning of points-to statements for a prioritized points-to analysis. • Probabilistic points-to analysis using bloom filters. Points-to analysis as a set of linear equations. Normalized Input. p q p q p = &q address-of p=q copy p q p = *q load p q *p = q store p p p q p q q q