Download o 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Versant Object Database wikipedia , lookup

Transcript
Finding Application Errors and
Security Flaws Using PQL:
a Program Query Language
1
MICHAEL MARTIN, BENJAMIN LIVSHITS,
MONICA S. LAM
PRESENTED BY
SATHISHKUMAR
INSTRUCTOR
CHRISTOPH CSALLNER
Outline
2
1.
2.
3.
4.
5.
6.
7.
8.
Introduction
PQL
Abstract Execution Trace
PQL Query
Dynamic matcher
Static checker
Experimental results
Conclusion
1. Introduction
3
•
•
•
•
Program analyzer finds enormous errors in software
Program checkers targeted at finding patterns
common to many application programs
Error checkers to check whether program conforms
certain design rules
Deals with sequence of events associated with set of
related objects.
2. PQL
4
•
PQL is Program Query Language
•
Allows programmers to express questions
•
Query looks like a code excerpt corresponding to shortest
amount of code that violate design rule.
•
If match found, then specify action to perform
•
Matched events may be widely spaced
Techniques used & Results found
5
 Both Static & Dynamic analysis
 6 large real world open source java applications
 Contains nearly 60k classes
 Found 206 major errors such as
 Security flaws
 Resource leaks
 Violation of consistency invariants
Focus
6
 PQL focuses on important class of error patterns that deal
with sequences of events associated with a set of related
objects.
 Events may be scattered throughout different methods
 PQL finds all matches in program that have equivalent
behavior
 Records relevant information or corrects erroneous
execution
Example – SQL injection vulnerability
7
 Applications using user-controlled input strings directly as database
query cmds are susceptible to SQL injections
 Code fragment in a Java servlet:

con.execute(request.getParameter("query"));
 This code reads a parameter from an HTTP request and passes it
directly to a database backend. By supplying an appropriate query, a
malicious user can gain access to unauthorized data, damage the
contents in the database, and in some cases, even execute arbitrary
code on the server.
Resolving SQL injection vulnerability
8
 To catch this, check the following in code



object r of type HttpServletRequest,
object c of type Connection, and
object p of type String
 Result of invoking getParameter on ‘r’ yields string ‘p’
 ‘p’ is used as a parameter to invocation of execute on ‘c’
 If true,

Replace call to ‘execute’ with ‘Util.CheckedSQL’ that validates query
to ensure that it matches a permissible action. If the query is invalid
or susceptible to bad, the request is not made.
Note: Two events in the application need not happen consecutively.
SQL injection query (sample)
9
Static & Dynamic checkers
10
 Static checker:





Finds all potential matches in program
Uses points-to analysis
Results are flow insensitive with respect to the query
Does not ensure that calls occur in the same order
Results will have false positives but not false negatives
 Dynamic checker:



Find matches occur at runtime
Precise and permit actions to be triggered
PQL creates an instrumented version of the input program that
reports runtime match iff there are object instances that match the
query
3. Abstract Execution Trace
11
 Abstract the program execution as a trace of primitive
events, each of which contains a unique event ID, an
event type, and a list of attributes.
 Objects are named by unique identifiers.
 PQL focuses on objects, and so it only matches against
instructions that directly dereference objects.
 PQL currently does not allow references to variables of
primitive data types such as integers, floats and
characters.
AET (cont’d)
12
 Field loads and stores:
 The attributes of these event types are the source object, target object,
and the field name.
 Array loads and stores:
 The attributes of these event types are the source and target objects. The
array index is ignored.
 Method calls and returns:
 The attributes of these event types are the method invoked, the formal
objects passed in as arguments and the returned object. The return event
parameter includes the ID of its corresponding call event.
 Object creations:
 The attributes of this event type are the newly returned object and its
class.
 End of program:
 This event type has no attributes and occurs just before the Java Virtual
Machine terminates.
AET-Example
13
If Len=2: Two matches are found in this trace with simpleSQLinjection query.
4. PQL query
14
 A PQL query is a pattern to be matched on the
execution trace and actions to be performed upon
the match.
 A match to the query is a set of objects and a
subsequence of the trace that together satisfy the
pattern.
 Two matches:
 r=o3, c=o5, p=o4
 r=o3, c=o5, p=o7
Query Grammar
15
 Wildcard “_”

Query can use wildcard symbol “_” whose different occurrences can
be matched to different member names or objects.
 Sequence “a;b”
 Stmt ‘a’ is followed by ‘b’.
 It may not be contiguous. Any events may occur between them.
 Exclusion “~ b”


Stmt ‘b’ should not exist.
a; ~b; c matches ‘a’ followed by ‘c’ iff ‘b’ does not occur between
them.
 Alternation “|”

a|b is the statement matching either a or b.
Query Grammar (cont’d)
16
 Partial order “a,b,c;”
Three statements a, b, and c would match in any order
 Within construct: “within”



Matching method call event, matching pattern
Insisting that the return of the method should not occur at any point
between the call and the full match of the pattern.
 Checking for potential leaks of file handles:
Query Grammar (cont’d)
17
Subqueries
18
 Subqueries allow users to specify recursive event
sequences or recursive object relations.
 Subqueries are analogous to recursive functions in a
programming language.
 They can return multiple values, which are bound to
variables in the calling query.
 By recursively invoking subqueries, each with its own
set of variables, queries can match against an
unbounded number of objects.
Recursive subquery
19
Recursive subquery (cont’d)
20
 Recursion is useful for matching against java wrappers.
 Java exposes higher-level I/O functions by providing wrappers over
base input streams.
 For example, to read Java Objects from some socket s, one might first
wrap the stream with a BufferedInputStream to cache incoming data,
then with an ObjectInputStream to parse the objects from the stream
Recursive subquery (cont’d)
21
 Captures arbitrary
levels of matching:
The base case in derivedStream
Subquery declares that any stream
can be considered derived from itself.
The other captures a single wrapper
and then re-invokes derivedStream
recursively.
The query first finds all the streams
derived from the input stream of a
socket, then all objects read from
any of the derived streams.
Match found!
22
 PQL provides two facilities to log information
about matches or perform actions.
 executes:

Executes a specified method when match occurs.
 replaces:
 Replace existing stmt with specified method that represent
actions to be executed in its place.
 Symbol “*”
 Represents that every variable in match will be packaged into a
collection that can be handled generically.
 Util.PrintStackTrace(*)
5. Dynamic matcher
23
 Approach to finding matches to PQL queries
dynamically consists of the following three steps:

Translate queries to state machines.

Instrument the target application to produce the full abstract
execution trace.

Use a query recognizer to interpret all the state machines over
the execution trace to find all matches.
State machines
24
 State machine contains a set of states, which includes a
start state, an fail state, and an accept state.
 Each state carry ‘bindings’ with them.
 Bindings:

mapping from variables in a PQL query to objects in the heap at run
time.
 A state transition specifies the event for which under
which current state and current bindings transition to
the next state and a new set of bindings.
 State transitions generally represent a single primitive
statement corresponding to a single event in the
execution trace.
Translate Query to State machines
25
query main()
uses Object x, final;
matches {
x = getParameter(_) | x = getHeader();
f := derived (x);
execute (f);
}
query derived(Object x)
uses Object t;
returns Object y;
matches {
{ y := x; }
| { t = x.toString(); y := derived(t); }
| { t.append(x); y := derived(t); }
}
State Machine – Query main()
26


*
*
x = getParameter(_)
x = getHeader(_)


f := derived(x)
*
execute(f)
State Machine – Query derived()
27
y := x




*
t=x.toString()
y := derived(t)
*


t.append(x)
y := derived(t)
o1 = getHeader(o2)
28

{}
{}

{}
*
*
x = getParameter(_)
x = getHeader(_)
{ x=o1 }


{ x=o1 }1
f := derived(x)
*
execute(f)
derived(O1)
29
y := x

{x=o1}
{x=y=o1}
*

t=x.toString()
{x=o1}
y := derived(t)


{x=o1}

*

t.append(x)
{x=o1}
y := derived(t)
{x=y=o1}
o1 = getHeader(o2)
30

{}
{}

{}
*
*
x = getParameter(_)
x = getHeader(_)
{ x=o1 }


{ x=o1 }1
f := derived(x)
*
{x=o1,f=o1}
execute(f)
derived(o1)
o3.append(o1)
31
y := x

{x=o1}
*

t=x.toString()
{x=o1}
{x=y=o1}
y := derived(t)


{x=o1}

*

t.append(x)
{x=o1}
y := derived(t)
{x=o1, t=o3}2
{x=y=o1}
derived(o3)
32
y := x

{x=o3}
*

t=x.toString()
{x=o3}
{x=y=o3}
y := derived(t)


{x=o3}

*

t.append(x)
{x=o3}
y := derived(t)
{x=y=o3}
o1 = getHeader(o2)
o3.append(o1)
33
y := x

{x=o1}
*

t=x.toString()
{x=o1}
{x=y=o1}
{x=y=o1}
y := derived(t)
{x=o1,
 y=t=o3}

{x=o1}

*

t.append(x)
{x=o1}
y := derived(t)
{x=o1, t=o3}2
{x=o1, y=t=o3}
execute(O3)
34

{}
{}

{}
*
*
x = getParameter(_)
x = getHeader(_)
{ x=o1 }


{ x=o1 }1
f := derived(x)
o1 = getHeader(o2)
o3.append(o1)
o3.append(o4)
o5 = execute(o3)
*
{x=o1,f=o1}
execute(f)
{x=o1,f=o3}
, {x=o1,f=o3}
6. Static checker
35
 Uses pointer analysis technique
 It is a static code analysis technique that establishes which
pointers, or heap references, can point to which variables or
storage locations.
 The points-to-information is stored in a deductive
database called bddbddb.
 The data are compactly represented with binary
decision diagrams(BDDs), and can be accessed
efficiently with queries written in the logic
programming language Datalog.
‘bddbddb’ Database
36
 All inputs and results for the static analyzer are stored as
relations in the ‘bddbddb’ database.
 Database includes byte codes B, variables V , methods M,
contexts C, integers Z and heap objects.
 Context represents the various call chains that can occur
in the program.
 The source program is represented as a number of input
relations:






actual
ret
fldld
fldst
arrayld
arrayst
parameter passing
method returns
field loads
field stores
array loads
array stores
Datalog
37
 A Datalog program P consists of a set of domains D,
a set of relations R, and a set of rules Q
Datalog (cont’d)
38











Relation vP0 is the set of initial points-to relations.
vP0(v, h)

Places a reference to heap object ‘h’ in variable ‘v’ in an operation.

Ex: s = new String()
store(x,f,y)

x.f = y
load(x,f,y)

y = x.f
assign(x, y)

x=y
vP(v, h)

is true if variable v may point to heap object h at any point during program execution
hP(h1, f, h2)

is true if heap object field h1.f may point to heap object h2
Rule 1

If v has a reference to heap object h then v can point to h
Rule 2

If variable v2 can point to object h and v1 includesv2, then v1 can also point to h
Rule 3

v1.f = v2, if v1 can point to h1 and v2 can point to h2, then h1.f can point to h2
Rule 4

v2 = v1.f , if v1 can point to h1 and h1.f can point to h2, then v2 can point to h2
Java to Input relations
39
• Domain V contains values va, vb, vd representing variables a, b, d
• Domain H contains values h1 , h3 representing objects allocated on lines 1,3
• Domain F consists of value ‘name’, representing ‘name’ field of a Dog object
• Initial points-to relations in vP0 are (va, h1) and (vd, h3)
• The program has
• one assignment operation: assign(vb, va)
• one store operation: store(vd, name, vb)
Java to Input relations (cont’d)
40
 Satisfies Rule 1

vP(va, h1) and vP(vd, h3) are true
 Satisfies Rule 2

vP(vb, h1) is true since assign(vb, va) and vP(va, h1) are true
 Satisfies Rule 3

hP(h3, name, h1) is true since store(vd, name, vb), vP(vd, h3) and
vP(vb, h1) are true.
PQL to Datalog
41
PQL to Datalog (cont’d)
42

Query is represented as a number of input relations:
 actual ->parameter passing
ret ->method returns
 Fldld->field loads
fldst->field stores
 Arrayld->array loads
arrayst->array stores
PQL to Datalog (cont’d)
43
 Datalog rule says that an object h is a cause of an injection
if b1 is a call to getParameter, b2 is a call of execute, and the
return result of getParameter v1 in some context c1 points to the
same heap object h as v2, the first parameter of the call to execute
in some context c2.
 Here the result of getParameter() is the input to execute(), hence
match found!
7. Experimental results
44
Applications
No. of Classes
No. of bugs found
Eclipse
19,439
192
personalblog
5,236
2
road2hibernate
7,062
1
snipsnap
10,851
8
roller
16,359
1
webgoat
1,021
2
TOTAL
59,968
206
Major Error patterns
45
Important error patterns found by PQL:
 Serialization errors


data corruption bug in web servers.
Ex: do not store object of type X in Y
 SQL injections

a major threat to the security of database servers
 Mismatched method pairs



causes resource leaks and data structure inconsistencies
“a call to method A must always be followed by a call to method B”
Ex: lock();…….unlock();
 Lapsed listeners

a common memory leakage pattern in Java that may lead to resource
exhaustion and crashes in long-running applications
8. Conclusion
46
•
•
•
•
•
•
•
•
PQL is Program Query Language
Allows programmers to express questions
Query looks like a code excerpt corresponding to shortest
amount of code that violate design rule.
If match found, then specify action to perform
Matched events may be widely spaced
Static analysis can solve serialization error query
Dynamic analysis for complex queries like
matched method pairs and lapsed listeners
206 bugs in 60k classes
References
47
 Finding Application Errors and Security Flaws Using
PQL: a Program Query Language, Michael M., Benjamin
L., Monica S. L., Computer Science Department,
Stanford University
 research.microsoft.com/~livshits/papers/ppt/oopsla05.
ppt
 Using Datalog with Binary Decision Diagrams for
Program Analysis, John W., Dzintars A., Michael C., and
Monica S.L., Computer Science Department, Stanford
University, Stanford, CA 94305, USA
Q?
48
QUESTIONS???
THANK YOU!!!