Download Data Structure Analysis:

Document related concepts
no text concepts found
Transcript
Data Structure Analysis:
A Fast and Scalable Context-Sensitive Heap Analysis
Author: Chris Lattner Vikram Adve
University of Illinois at Urbana-Champaign
Presenter: Cheng Li
Static Program Analysis Seminar
1
Outline
●
Background
●
Data Structure analysis Overview
●
Data Structure Graph
●
Construction Algorithm
●
Summary
2
Static Program Analysis Seminar
Outline
●
Background
●
Data Structure analysis Overview
●
Data Structure Graph
●
Construction Algorithm
●
Summary
3
Static Program Analysis Seminar
Background
●
●
Data Structure Analysis enables
–
analyses of logical data structures.
–
transformations of logical data structures.
Related analysis tool
–
Alias analysis
–
Pointer analysis
–
Shape analysis
4
Static Program Analysis Seminar
Limitations of Existing tools
●
Hardly apply to entire data structures
–
●
Not support type-unsafe programs
–
●
lists, heaps or graphs
Incompatible accesses to an object
Hardly include recursion
5
Static Program Analysis Seminar
Limitations of Existing tools
●
Do not correctly handle function pointer
●
Not available for incomplete programs
●
●
Hardly scale to a program with large number
of global variables
Not practical for use in commercial compiler
6
Static Program Analysis Seminar
Shape vs. Data structure analysis
●
Representation of a list with 4 elements
X
List: H
List * int
Shape analysis
Data Structure analysis
7
Static Program Analysis Seminar
Desire Analysis Tool:
●
Full context-sensitivity:
–
Identifying disjoint instances of data structures
–
Keeping track of calling path
–
i.e named objects by entire acyclic call
Create a list L
A
Create a list L
B
C
Create a list L
Function call
Static Program Analysis Seminar
A.L
A.B.L
A.B.C.L
8
Desire Analysis Tool:
●
Field-Sensitivity:
–
Identifying the internal connectivity pattern
–
Distinguishing different structure fields.
1
List L
2
3
4
1 points to 2
●2 points to 3
●3 points to 4
●
9
Static Program Analysis Seminar
Challenging
●
●
Full Context-Sensitivity
–
Entire call path may be very large
–
Recursion is difficult
Full Field-Sensitivity
–
Not efficient in non-type-safe language
10
Static Program Analysis Seminar
Outline
●
Background
●
Data Structure analysis Overview
●
Data Structure Graph
●
Construction Algorithm
●
Summary
11
Static Program Analysis Seminar
Overview of Data Structure Analysis Algorithm
●
Compute a Data Structure Graph for each
function in a program
●
Identify memory objects
●
Capture connectivity patterns
–
Caller and Callee
Create a list L
A
As to B, A is caller
●
B
Create a list L
As to A, B is callee
●
Function call
12
Static Program Analysis Seminar
Overview of Data Structure Analysis Algorithm
●
Field-sensitivity
–
●
Fully context-sensitivity
–
●
Type-safe until inconsistence appears
Unification based approach
Efficient and Scalable
13
Static Program Analysis Seminar
Favorite Code
14
Static Program Analysis Seminar
Outline
●
Background
●
Data Structure analysis Overview
●
Data Structure Graph
●
Construction Algorithm
●
Summary
15
Static Program Analysis Seminar
DSG Notation
16
Static Program Analysis Seminar
DSG Notation
17
Static Program Analysis Seminar
Data Structure Graph
●
●
●
DS graph is a graph for each function (F) in
a program
Summarizing the memory objects with
–
Inter-procedure : F is caller
–
Intra-procedure : F is callee
DS Node:
–
Represent a set of distinct memory object.
–
List, graph and etc.
18
Static Program Analysis Seminar
Data Structure Graph
●
Assumption:
–
●
A single type system with integer, float, pointers,
structures, arrays, and functions types.
For any type T, fields(T) returns a set of field
names for T.
–
A array with size k can be represent either a
structure with k fields or a single field
–
An unknown-size array represents a single field
19
Static Program Analysis Seminar
Data Structure Graph
●
●
Virtual register
–
Represent scalar variables: integer, float and
pointer
–
All arithmetic operations on Virtual register
Memory location
–
Heap object: malloc
–
Stack object: alloca
–
Global object: global variables or functions
20
Static Program Analysis Seminar
Data Structure Graph
●
DS Graph for a function is a finite directed
graph represented as a tuple
DSG(F) = <N, E, Ev,C>, where:
–
N : DS Nodes
●
–
variables, data structures, functions
E : a set of edges
●
●
from one field to another field
<ns, fs> → <nd, fd>
–
ns , nd are DS Nodes,
– fs
in fields( T(ns) ), fd in fields( T(nd) ).
21
Static Program Analysis Seminar
Data Structure Graph
●
DS Graph for a function is a finite directed
graph represented as a tuple
DSG(F) = <N, E, Ev,C>, where:
–
Ev : a set of edges
●
–
from virtual registers v to the target field <n, f >
pointed by v
C : a set of call nodes, with a tuple
(r, f, a1, a2, ... , ak).
●
●
●
r : return value
f : function being called
a1, ..., ak : arguments for such call
22
Static Program Analysis Seminar
Graph Nodes and Fields
●
DS Nodes represent a set of memory
objects associated with:
–
T(n) : type information
–
G(n) : a set of global objects
–
flags(n) : property flags
●
–
H,S,G,U,M,R,C and O
Fields(T(n)) : a set of fields for the type T(n)
23
Static Program Analysis Seminar
Flags for each node
●
Memory allocation classes
–
H : Heap-allocated objects by malloc
–
S : Stack-allocated objects by alloca
–
G : Global variables or functions
–
U : Unknown objects with incomplete
information
–
C : nodes analyzed completely
24
Static Program Analysis Seminar
Flags for each node
●
Heap, Stack, Global, Unknown examples
struct list{
list * Next;
int Data;
}
int size = 10;
int F1(list *L, void (*F)(int *,int)){
X
Y
List: H
List: S
List * int
List * int
F(&L->Data, size);
}
int main(){
list *X = malloc(sizeof(list));
list *Y = alloca(sizeof(list));
F
int: G
size
Void: U
}
25
Static Program Analysis Seminar
Flags for each node
●
Modify (M) and Read (R)
struct list{
list * Next;
int Data;
}
int size = 10;
void Next(list *L){
L = L->Next;
X
L
List: MR
List: R
int
List * int
}
void add(int *X) {
(*X) + = size;
}
26
Static Program Analysis Seminar
Type safe vs unsafe
●
●
●
Type safe language
–
All accesses to all objects at node n are a
consistent type t.
–
T(n) = t.
Type unsafe : challenging
–
Incompatible types
–
Field-sensitivity expensive
Solution:
–
Collapse if incompatible type occurred
27
Static Program Analysis Seminar
Type safe vs unsafe
●
MergeCells (<n1,f1>, <n2,f2>)
–
Merge two cells pointing to a field.
–
Two cases:
●
Compatible accesses: n1 = n2, f1 = f2
Merge compatible cells
Union flags of n1 into flags of n2
● Merge out-edge of <n1, f> with corresponding fields of n2
● Move in-edges of n1 to corresponding fields of n2
● Discard n1
● Return <n2, f2>
●
28
Static Program Analysis Seminar
Type safe vs unsafe
●
MergeCells (<n1,f1>, <n2,f2>)
–
Merge two cells pointing to a field.
–
Two cases:
●
●
Compatible accesses: n1 = n2, f1 = f2
Incompatible accesses: n1 != n2 or f1 != f2
–
Collapse n2
Int *a = malloc(10*sizeof(int));
... ...
void *b = a;
b = 0;
Collapse a
29
Static Program Analysis Seminar
Flag for each node
●
Collapse(O) a node when a type-safety
violation found
Collapse (node a)
a) represent a by a single field
b) reset type T(a) with void*
c) flags(a) = flags(a) U 'O'
30
Static Program Analysis Seminar
Outline
●
Background
●
Data Structure analysis Overview
●
Data Structure Graph
●
Construction Algorithm
●
–
Local analysis
–
Bottom Up
–
Top Down
Summary
31
Static Program Analysis Seminar
Construction Algorithm
●
DS Graphs are created and refined in three
steps:
–
Local analysis phase:
●
–
Construct DS graph by intra-procedural connectivity
Global analysis phase:
●
●
●
Bottom-Up and Top-Down
Refine DS graph by inter-procedural connectivity
Caller and Callee
32
Static Program Analysis Seminar
Local analysis phase - Example
void addGToList( list *L) {
do_all(L, addG);
}
call
r
f
Void (list* L, void (int*)*) : GC
do_all
L
void(int *): G
addG
Void
Local graph for a function addGToList
33
Static Program Analysis Seminar
Local analysis phase
●
Construct rough DS Graph for each function
by using local information
–
Starts with an empty graph
–
Makenodes for
●
●
●
virtual registers (local variables)
global variables
data structures
–
Create a call site “C” for each function calling
–
Merge cells pointing to the same objects
34
Static Program Analysis Seminar
Local Graph
int Global = 10;
void addG(int *X) {(*X) += Global;}
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
void addGToList( list *L) { do_all(L, addG);}
list * makeList (int num){
list *New = malloc(sizeof(list));
New->Next = Num ? makeList(Num – 1) : 0;
New->Data = Num ; return New;
}
int main(){
list *X = makeList(10);
addGToList(X);
Global = 20;
}
Statics Program Analysis Seminar
Global variable:
int: MR
Global
Virtual registers:
X
int: MR
int
35
Local Graph
int Global = 10;
void addG(int *X) {(*X) += Global;}
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
X
Global
void addGToList( list *L) { do_all(L, addG);}
list * makeList (int num){
list *New = malloc(sizeof(list));
New->Next = Num ? makeList(Num – 1) : 0;
New->Data = Num ; return New;
}
int main(){
list *X = makeList(10);
addGToList(X);
Global = 20;
}
Static Program Analysis Seminar
int: MR
int: MR
int
Local graph for function addG
36
Local Graph
int Global = 10;
void addG(int *X) {(*X) += Global;}
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
Virtual registers:
L
&L->Data
List: R
int: R
int
List * int
Function Pointer:
FP
Void
37
Static Program Analysis Seminar
Local analysis phase
●
Construct rough DS Graph for each function
by using local information
–
Starts with an empty graph
–
Makenodes for
●
●
●
virtual registers (local variables)
global variables
data structures
–
Create a call site “C” for each function calling
–
Merge cells pointing to the same objects
38
Static Program Analysis Seminar
Local Graph
void do_all(list *L, void (*FP)(int *))
{
do{ FP(&L->Data);
L = L->Next; }while(L);
}
Call site:
f
L
&L->Data
List: R
int: R
int
List * int
call
r
Virtual registers:
a
Function Pointer:
FP
Void
int: R
int
Void
39
Static Program Analysis Seminar
Local analysis phase
●
Construct rough DS Graph for each function
by using local information
–
Starts with an empty graph
–
Makenodes for
●
●
●
virtual registers (local variables)
global variables
data structures
–
Create a call site “C” for each function calling
–
Merge cells pointing to the same objects
40
Static Program Analysis Seminar
Local Graph - merge
void do_all(list *L, void (*FP)(int *))
{
do{ FP(&L->Data);
L = L->Next; }while(L);
}
Call site:
f
L
&L->Data
List: R
int: R
int
List * int
call
r
Virtual registers:
a
Function Pointer:
FP
Void
int: R
int
Void
41
Static Program Analysis Seminar
Local Graph - merge
void do_all(list *L, void (*FP)(int *))
{
do{ FP(&L->Data);
L = L->Next; }while(L);
}
Local graph being constructed:
r
f
Void
L
&L->Data
List: R
int: R
List * int
call
FP
Virtual registers:
int
a
int: R
int
42
Static Program Analysis Seminar
Local Graph - merge
void do_all(list *L, void (*FP)(int *))
{
do{ FP(&L->Data);
L = L->Next; }while(L);
}
Local graph being constructed:
r
f
Void
L
&L->Data
List: R
int: R
List * int
call
FP
Virtual registers:
int
a
int: R
int
43
Static Program Analysis Seminar
Local Graph - merge
void do_all(list *L, void (*FP)(int *))
{
Virtual registers:
do{ FP(&L->Data);
L = L->Next; }while(L);
L
}
Local graph being constructed:
List: R
List * int
call
FP
r
f
Void
&L->Data
a
int: R
int
44
Static Program Analysis Seminar
Local Graph - merge
void do_all(list *L, void (*FP)(int *))
{
Virtual registers:
do{ FP(&L->Data);
L = L->Next; }while(L);
L
}
Local graph being constructed:
List: R
List * int
call
FP
r
f
Void
&L->Data
a
int: R
int
45
Static Program Analysis Seminar
Local Graph - merge
void do_all(list *L, void (*FP)(int *))
{
do{ FP(&L->Data);
L = L->Next; }while(L);
}
call
FP
L
Void
r
f
a
&L->Data
List: R
List * int
Final local graph do_all
46
Static Program Analysis Seminar
Local Graph
void addGToList( list *L) { do_all(L, addG);}
call
r
f
Void (list* L, void (int*)*) : GC
do_all
L
void(int *): G
addG
Void
Limitation: Two callee functions are not resolved since information is not complete.
47
Static Program Analysis Seminar
Global Analysis Phase
●
BU analysis
–
Eliminate incomplete information in DS Graph
for a function F
–
Inline operation
●
●
–
Clone callees' graph into F
Merge arguments
Four cases:
●
●
●
●
Simplest one: no recursion, no function pointers
Have function pointers, but no recursion
Recursion without function pointers
Recursion with function pointers
Static Program Analysis Seminar
48
No function pointers, no recursion
do_add local graph:
Code:
void do_add(int *a) {
(*a)++;
}
*a
void add(int b) {
do_add(b);
}
int: MR
int
add local graph:
call
r
f
arg
b
Void do_add(int *a)
int: R
int
49
No function pointers, no recursion
Code:
BU analysis:
void do_add(int *a) {
(*a)++;
}
●
Copy callee's graph into caller's graph
void add(int b) {
do_add(b);
}
add local graph:
call
r
f
arg
b
Void do_add(int *a)
int: R
int
*a
int: MR
int
50
No function pointers, no recursion
Code:
void do_add(int *a) {
(*a)++;
}
BU analysis:
Inline callee's graph into caller's graph
● Merge formal arguments and actual arguments
●
void add(int b) {
do_add(b);
}
add graph:
call
r
f
arg
b
Void do_add(int *a): C
int: MR
int
51
No function pointers, no recursion
Code:
void do_add(int *a) {
(*a)++;
}
BU analysis:
Inline callee's graph into caller's graph
● Merge formal arguments and actual arguments
●
void add(int b) {
do_add(b);
}
add graph:
add BU graph:
call
r
f
b
arg
b
Void do_add(int *a): C
int: MR
int
int: MR
int
52
Function pointers, no recursion
Code:
int Global = 10;
BU phase:
●
void addG(int *X) {(*X) += Global;}
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
●
Inline all function pointers
if known
TD phase handles all
unknown function pointers
void addGToList( list *L) { do_all(L, addG);}
Examples:
●
●
FP can not be resolved in do_all, no caller information.
But, addG can be resolved in addGToList, since addG is
known.
53
Function pointers, no recursion
int Global = 10;
do_all local graph
void addG(int *X) {(*X) += Global;}
FP
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
call
r
r
L
f
a
&L->Data
List: R
List * int
Void
void addGToList( list *L) { do_all(L, addG);}
addGToList local graph
call
L
f
Void (list* L, void (int*)*) : GC
do_all
void(int *): G
addG
Void
54
Function pointers, no recursion
int Global = 10;
do_all local graph
void addG(int *X) {(*X) += Global;}
FP
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
L
Void
void addGToList( list *L) { do_all(L, addG);}
call
r
f
a
&L->Data
List: R
List * int
addGToList BU graph by inlining do_all
call
L
Void (int*: GC)
addG
r
f
a
List: R
List * int
55
Function pointers, no recursion
int Global = 10;
do_all local graph
void addG(int *X) {(*X) += Global;}
FP
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
call
r
L
f
addGToList BU graph by inlining do_all
&L->Data
List: R
List * int
Void
void addGToList( list *L) { do_all(L, addG);}
a
addG graph
call
L
r
f
a
X
int: MR
Global
int: MR
Void (int*: GC)
addG
List: R
List * int
int
56
Function pointers, no recursion
Code:
do_all local graph
int Global = 10;
FP
void addG(int *X) {(*X) += Global;}
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
L
Void
call
r
f
a
&L->Data
List: R
List * int
void addGToList( list *L) { do_all(L, addG);}
addGToList Final BU graph by inlining add_G
L
int: GR
Global
List: MR
List * int
57
Recursion
B
A
C
D
Recursion
58
Static Program Analysis Seminar
Recursion
●
–
B
A
Problem:
C
D
Strong Connected Components(SCC)
●
Infinite call paths
in SCC
Solution:
–
Ignore contextsensitivity in SCC
–
Merge arguments
for each call
59
Static Program Analysis Seminar
Recursion
int Global = 10;
list * makeList (int num){
list *New = malloc(sizeof(list));
New->Next = Num ? makeList(Num – 1) : 0;
New->Data = Num ; return New;
}
int main(){
list *X = makeList(10);
list *Y = makeList(100);
addGToList(X);
Global = 20;
}
Main BU graph
X
int: GRMC
Y
List: MRHC
List: MRHC
List * int
List * int
Global
60
Global Analysis Phase
●
Top-Down Phase
–
Similar with BU phase but reverse order
–
Inline caller's graph into callee's
–
Eliminate incomplete information
int Global = 10;
void addG(int *X) {(*X) += Global;}
void do_all(list *L, void (*FP)(int *)){
do{ FP(&L->Data);
L = L->Next; }while(L);
}
void addGToList( list *L) { do_all(L, addG);}
61
Static Program Analysis Seminar
Global graph
●
Final result after three steps:
Main Final graph
X
int: GMRC
Y
List: MRHC
List: MRHC
List * int
List * int
Global
62
Summary
●
●
Data Structure Analysis
–
Analyze and transform entire data structure
–
Identify disjoint memory objects
–
Full context-sensitivity and field-sensitivity
Data Structure Graph
–
Definition
–
Construction
–
Examples
63
Static Program Analysis Seminar
Reference
●
[1] Chris Lattner, Vikram Adve, Data
Structure Analysis: A Fast and Scalable
Context-Sensitive Heap Analysis.
64
Static Program Analysis Seminar
End
Thanks for your attention!
Questions?
65
Static Program Analysis Seminar
Related documents