Download CComp_backend

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Overview of Back-end for CComp
Zhaopeng Li
Software Security Lab.
June 8, 2009
Outline
•
•
•
•
Design Points
Assembly Language : “x86”
Low-level Intermediate Language
Future Work
Design Points
• Assembly Language
– Target : SCAP with x86 abstract machine;
– Maybe next version the program logic is changed;
– Or another machine will be used.
• Low-level Intermediate Language
– Hide some machine-specific things;
– Note that, this level can be just a helper to
generate code and proof.
Assembly Language : “x86”
Some Topics about “x86”
• Data Representation
– 32-bit vs “fake” 32-bit
• Don’t care how to store the data as bits.
• Integer : 4 bytes
• Pointer : 4 bytes
• Data Alignment
• Callee-saved Registers
– EBX, ESI, EDI, EBP
Some Topics about “x86” (cont.)
• Calling convention:
1. Parameters passed on the stack,
pushed from right to left; Or the first
three are passed through register EAX,
ECX and EDX, and the other are passed
on the stack;
2. Register EAX, ECX, and EDX are used in
the callee; Other registers must be
saved on the stack and pop before the
return of the function;
3. Return value is stored in the register
EAX ;
4. Caller cleans up the stack (parameter).
Some Topics about “x86” (cont.)
Prolog (typical)
Epilog(typical)
_function:
push ebp ;store the old base pointer
mov esp, ebp ;make the
base
enter
x, 0
; pointer point to the current stack
; location
sub x, esp ; x is the size, in bytes
local variables
…
esp
old eip
parameters
ebp
local variables
old ebp
old eip
old eip
…
ebp
esp
old ebp
parameters
…
func. entry
mov ebp, esp ;reset the stack to
; "clean" away the local variables
pop ebp
leave
;restore the originalret
base pointer
ret ;return from the function
after Stack frame setup
esp
parameters
…
after the return
ebp
Assembly Abstract Machine “m86”
• Code Heap (C)
– Code storage,
– Unchanged during execution
• Machine State
– Memory (M)
– Register File (R)
– Instruction Pointer (eip),
• current instruction c = C(eip)
• Or just use instruction sequence (I)
Assembly Language : “x86”
•
•
•
•
•
“AT&T-syntax”
Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp
FReg. fr ::= sf | zf
Int. b ::= n (integer)
Instr. i ::= add r1, r2 | addi n, r | sub r1, r2 | subi n, r
| mul r1, r2 | muli n, r
| mov r1, r2 | movi n, r
| movs r1, n(r2) | movl n(r1), r2
| push r
| pop r
| cmp r1, r2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b
| jmp b
| call b
| ret
| enter n, 0
| leave
| malloc r | free r
Program Logic
• Based on SCAP
• Specification (p, g)
– p : State -> Prop
– g : State -> State -> Prop
• Inference Rules
– Well-formed program
• Well-formed basic block
• Well-formed instruction
Main Objects
• Code Generation
– Minimize the proof size
• Eg. the temporary result should be put in register not on the
stack
• Assertion
– Building (p, g) for each basic block
– Generating (p, g) for each program point
• Proof
– Generating proof for functions/basic blocks
– (reusing the proof of VC in source level)
Assertion Relationship
f : {(p’, g)}
f : {p} //{q}
Basic block1
L1 : {p1}
Basic block1
L1 : {(p’1,g1)}
Basic block2
Basic block2
p’ = trans(p) /\ paramp/\stack-regp
g = trans(q) /\ callee-saved-regg /\ stackg
p’ 1= trans(p1) /\ paramp 1/\ stack-regp 1
g1 = ?
Intermediate Language
x86 Assembly Lanuage
Figure Out G
f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4}
R
push ebp
R0
mov esp, ebp
R0(ebp) = R(ebp) /\ R0(esp) = R(esp) -4
R’(ebp) = R(ebp) /\ R0(ebp) = R(ebp)
/\ R’(esp)=R(esp)+4 /\ R0(esp) = R(esp) -4
R’(ebp) = R0(ebp)
/\ R’(esp)=R0(esp)+8
sub $12, esp
L1 : {g1}
Basic block2
Leave
ret
R’
The method:
1. Get state relation by rule of operational semantics;
2. Use the g of previous program point;
3. Do substitution and arithmetic.
g0
Figure Out G (cont.)
f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4}
R
push ebp
R’(ebp) = R0(ebp)
/\ R’(esp)=R0(esp)+8
R0
g0
mov esp, ebp
R1(ebp) = R0(esp) /\ R1(esp) = R0(esp)
R1
sub $12, esp
R’(ebp) = R0(ebp) /\ R1(ebp) = R0(esp)
/\ R’(esp)=R0(esp)+8 /\ R1(esp) = R0(esp)
R’(ebp) = M1(R1(ebp))
/\ R’(esp)=R1(esp)+8
L1 : {g1}
Basic block2
Leave
ret
R’
The method:
1. Get state relation by rule of operational semantics;
2. Use the g of previous program point;
3. Do substitution and arithmetic.
g1
Figure Out G (cont.)
f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4}
R
push ebp
R’(ebp) = R0(ebp)
/\ R’(esp)=R0(esp)+8
R0
g0
mov esp, ebp
R’(ebp) = M1(R1(ebp))
/\ R’(esp)=R1(esp)+8
R1
sub $12, esp
R2
R2(ebp) = R1(ebp) /\ R2(esp) = R1(esp)-12
L1 : {g1}
Basic block2
Leave
ret
R’
g1
R’(ebp) = M1(R1(ebp)) /\ R2(ebp) = R1(ebp)
/\ R’(esp)=R1(esp)+8 /\ R2(esp) = R1(esp)12
R’(ebp) = M2(R2(ebp))
/\ R’(esp)=R1(esp)+20
The method:
1. Get state relation by rule of operational semantics;
2. Use the g of previous program point;
3. Do substitution and arithmetic.
g2
Low-level Intermediate Language
Potential Benefits
• Hide some machine-specific things;
• Some optimizations could be done (optional);
• Make the implementation simple and
reusable
– (*Note that, this level is just a helper to generate
code and proof.*)
– Only add codes for translating from this level
when targeting different assembly logic
The Language
• Loc. l ::= r | s
• Int. o,b ::= n (integer)
• Slot. s ::= local(o)
| incoming(o)
| outgoing(o)
• Reg. r ::= r1 | r2 | r3 | …
//infinite pseudo-registers
• Instr. i ::= bop(bop, l1,l2, l) | uop(uop, l1, l)
| load(r, o, l)
| store(l, r, o)
| getstack(s, r)
| setstack(r, s)
| call(id, l)
| return r
| malloc(r)
| free(r)
| goto b
| label (b)
| cond(l1, cmp,l2, btrue)
• BinOp. bop::= add | sub | mul | …
• UnOp. Uop::= minus | …
• Comp. cmp::= gt | ge | eq | ne | lt | le
Code Generation (optional)
• Do some optimizations which do no affect
proof, such as:
– Branch tunneling
– Dead code elimination
• Future optimizations
– Other low-level optimizations may be done here
Related documents