Download Program Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 01 - Introduction
THEORY OF COMPILATION
EranYahav
1
Who?
EranYahav
Taub 734
Tel: 8294318
[email protected]
Monday 13:30-14:30
http://www.cs.tecnion.ac.il/~yahave
TAs:
• Adi Sosnovich
• Maya Arbel
• Guy Hefetz
2
What is a Compiler?
 “A compiler is a computer program that
transforms source code written in a
programming language (source language)
into another language (target language).
The most common reason for wanting to
transform source code is to create an
executable program.”
--Wikipedia
3
What is a Compiler?
source language
C
C++
Pascal
Java
target language
IA32
IA64
SPARC
txt
exe
C
C++
Executable
Pascal
Java code
Postscript
Source
TeX
text
Perl
JavaScript
Python
Ruby
Java Bytecode
…
Prolog
Lisp
Scheme
ML
OCaml
Compiler
4
What is a Compiler?
Compiler
txt
exe
Source
Executable
text
code
int a, b;
a = 2;
b = a*2 + 1;
MOV R1,2
SAL R1
INC R1
MOV R2,R1
5
Anatomy of a Compiler
Compiler
txt
Source
text
int a, b;
a = 2;
b = a*2 + 1;
exe
Frontend
Semantic
Backend
(analysis)
Representation
(synthesis)
Executable
code
MOV R1,2
SAL R1
INC R1
MOV R2,R1
6
SET R1,2
STORE #0,R1
SHIFT R1,1
STORE #1,R1
ADD R1,1
STORE #2,R1
Modularity
txt
Source
Language 1
txt
Source
Language 1
txt
Source
Language 1
int a, b;
a = 2;
b = a*2 + 1;
Frontend
Backend
SL1
TL1
Frontend
SL2
Semantic
Representation
Backend
TL2
Frontend
Backend
SL3
TL3
exe
Executable
target 1
exe
Executable
target 1
exe
Executable
target 1
MOV R1,2
SAL R1
INC R1
MOV R2,R1
7
Anatomy of a Compiler
Compiler
txt
Source
text
int a, b;
a = 2;
b = a*2 + 1;
exe
Frontend
Semantic
Backend
(analysis)
Representation
(synthesis)
Lexical
Analysis
Syntax
Analysis
Parsing
Semantic
Analysis
Intermediate
Representation
(IR)
Code
Generation
Executable
code
MOV R1,2
SAL R1
INC R1
MOV R2,R1
8
Interpreter
Interpreter
txt
Source
text
int a, b;
a = 2;
b = a*2 + 1;
Frontend
Semantic
Execution
(analysis)
Representation
Engine
Output
Input
9
Compiler vs. Interpreter
txt
Source
text
exe
Frontend
Semantic
Backend
(analysis)
Representation
(synthesis)
Frontend
Semantic
Execution
(analysis)
Representation
Engine
Executable
code
txt
Source
text
Input
Output
10
Compiler vs. Interpreter
3
b = a*2 + 1;
Frontend
Semantic
Backend
(analysis)
Representation
(synthesis)
MOV R1,8(ebp)
SAL R1
INC R1
MOV R2,R1
7
b = a*2 + 1;
Frontend
Semantic
Execution
(analysis)
Representation
Engine
3
7
11
Just-in-time Compiler (Java example)
txt
Java
Source
txt
Java source to
Java bytecode
compiler
Java
Java
Virtual Machine
Output
Bytecode
Input
Just-in-time compilation: bytecode interpreter (in the JVM) compiles program fragments
during interpretation to avoid expensive re-interpretation.
12
Why should you care?
 Every person in this class will build a parser some day
 Or wish she knew how to build one…
 Useful techniques and algorithms




Lexical analysis / parsing
Semantic representation
…
Register allocation
 Understand programming languages better
 Understand internals of compilers
 Understand (some) details of target architectures
13
Why should you care?
Compiler
 Useful formalisms

Source
Programming Languages
Software Engineering
Regular expressions
 Context-free grammars
 Attribute grammars
 Data structures
 Algorithms
Target
Runtime environment
Garbage collection
Architecture
14
Why should you care?
15
Course Overview
Compiler
txt
Source
text
Lexical
Analysis
Syntax
Analysis
Parsing
Semantic
Analysis
Inter.
Rep.
(IR)
Code
exe
Gen.
Executable
code
16
Journey inside a compiler
Token
Stream
txt
x = b*b – 4*a*c
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”>
<MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
Lexical
Analysis
Syntax
Analysis
Sem.
Analysis
Inter.
Rep.
Code
Gen.
17
Journey inside a compiler
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
expression
expression
MINUS
expression
Syntax
Tree
term
MULT
factor
term
term
term
factor
ID
MULT
factor
term
MULT
ID
‘b’
‘b’
ID
factor
factor
ID
ID
‘a’
‘c’
‘4’
Lexical
Analysis
Syntax
Analysis
Sem.
Analysis
Inter.
Rep.
Code
Gen.
18
Journey inside a compiler
Abstract
Syntax
Tree
MINUS
MULT
‘b’
MULT
‘b’
MULT
‘4’
Lexical
Analysis
Syntax
Analysis
‘c’
‘a’
Sem.
Analysis
Inter.
Rep.
Code
Gen.
19
Journey inside a compiler
MINUS
MULT
type: int
‘b’
loc: sp+16
type: int
loc: R1
Annotated
Abstract
Syntax
Tree
type: int
loc: R1
‘b’
MULT
type: int
loc: sp+16
MULT
type: int
loc: const ‘4’
Lexical
Analysis
Syntax
Analysis
type: int
loc: R2
type: int
loc: R2
‘c’
type: int
loc: sp+24
type: int
‘a’ loc: sp+8
Sem.
Analysis
Inter.
Rep.
Code
Gen.
20
Journey inside a compiler
type: int
loc: R1
MINUS
MULT
type: int
loc: sp+16 ‘b’
type: int
loc: R1
R2 = 4*a
R1=b*b
R2= R2*c
R1=R1-R2
type: int
MULT loc: R2
type: int
‘b’ loc: sp+16
type: int
loc: const
Intermediate
Representation
MULT
type: int
loc: R2
‘c’
type: int
loc: sp+24
type: int
‘a’ loc: sp+8
‘4’
Lexical
Analysis
Syntax
Analysis
Sem.
Analysis
Inter.
Rep.
Code
Gen.
21
Journey inside a compiler
type: int
loc: R1
MINUS
MULT
type: int
loc: sp+16 ‘b’
type: int
loc: R1
R2 = 4*a
R1=b*b
R2= R2*c
R1=R1-R2
type: int
MULT loc: R2
type: int
‘b’ loc: sp+16
type: int
loc: const
Intermediate
Representation
MULT
type: int
loc: R2
‘c’
type: int
loc: sp+24
MOV R2,(sp+8)
SAL R2,2
MOV R1,(sp+16)
MUL R1,(sp+16)
MUL R2,(sp+24)
SUB R1,R2
type: int
‘a’ loc: sp+8
‘4’
Lexical
Analysis
Syntax
Analysis
Sem.
Analysis
Inter.
Rep.
Code
Gen.
Assembly
Code
22
Error Checking
 In every stage…
 Lexical analysis: illegal tokens
 Syntax analysis: illegal syntax
 Semantic analysis: incompatible types,
undefined variables, …
 Every phase tries to recover and proceed with
compilation (why?)
 Divergence is a challenge
23
Errors in lexical analysis
txt
pi = 3.141.562
Illegal token
txt
pi = 3oranges
Illegal token
txt
pi = oranges3
<ID,”pi”>, <EQ>, <ID,”oranges3”>
24
Error detection: type checking
MULT
txt
MULT
x = 4*a*”oranges”
type: int
loc: const ‘4’
type: int
loc: R2
type: int
loc: R2
“oranges”
type: string
loc: const
type: int
‘a’ loc: sp+8
25
The Real Anatomy of a Compiler
txt
Source
text
Process
text
input
characters
Lexical
Analysis
tokens
Syntax
Analysis
AST
Sem.
Analysis
Annotated AST
Intermediate
code
generation
IR
Intermediate
code
optimization
Code
generation
IR
Symbolic Instructions
Target code
optimization
SI
Machine code
generation
MI
Write
executable
output
exe
Executable
code
26
Optimizations
 “Optimal code” is out of reach



many problems are undecidable or too expensive (NP-complete)
Use approximation and/or heuristics
Must preserve correctness, should (mostly) improve code
 Many optimization heuristics



Loop optimizations: hoisting, unrolling, …
Peephole optimizations
Constant propagation
 Leverage compile-time information to save work at runtime (precomputation)

Dead code elimination
 Majority of compilation time is spent in the optimization phase
27
Machine code generation
 Register allocation
 Optimal register assignment is NP-Complete
 In practice, known heuristics perform well
 assign variables to memory locations
 Instruction selection
 Convert IR to actual machine instructions
 Modern architectures
 Multicores
 Challenging memory hierarchies
28
Compiler Construction Toolset
 Lexical analysis generators
 lex
 Parser generators
 yacc
 Syntax-directed translators
 Dataflow analysis engines
29
Summary
 Compiler is a program that translates code
from source language to target language
 Compilers play a critical role
 Bridge from programming languages to the
machine
 Many useful techniques and algorithms
 Many useful tools (e.g., lexer/parser generators)
 Compiler constructed from modular phases
 Reusable
 Different front/back ends
30
Coming up next
 Lexical analysis
31
Related documents