Download CSE244 Compiler (a.k.a. Programming Language Translation)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Subroutine wikipedia , lookup

Library (computing) wikipedia , lookup

Indentation style wikipedia , lookup

Go (programming language) wikipedia , lookup

C syntax wikipedia , lookup

Structured programming wikipedia , lookup

Comment (computer programming) wikipedia , lookup

C++ wikipedia , lookup

Coding theory wikipedia , lookup

C Sharp syntax wikipedia , lookup

GNU Compiler Collection wikipedia , lookup

Assembly language wikipedia , lookup

C Sharp (programming language) wikipedia , lookup

Program optimization wikipedia , lookup

Interpreter (computing) wikipedia , lookup

Compiler wikipedia , lookup

Name mangling wikipedia , lookup

Cross compiler wikipedia , lookup

Transcript
CSE4100
Compiler
(a.k.a. Programming Language Translation)
Notes credit go to
Me
Laurent Michel
Aggelos Kiayias
Steven Demurjian
Robert La Barre
CSE
CSE
CSE
CSE
UTRC
Overview
• Objectives
• Structure of the course
• Evaluation
• Compiler Introduction
– A compiler bird’s eye view
• Lexical analysis
• Parsing (Syntax analysis)
• Semantic analysis
• Code generation
• Optimization
CSE244 Compilers
2
Information
• Course web page
– Will be on HuskyCT
– Currently
• http://www.engr.uconn.edu/~weiwei/
• Instructor
– Wei Wei
– [email protected]
• Office hour
– TuTh 3:30pm ~ 4:30pm
– Wednesday 1:30pm ~ 4:30pm
CSE244 Compilers
3
Objectives
• Compilers are….
– Ubiquitous in Computer Science
– Central to symbolic processing
– Relates to
• Theory of computing
• Practice of computing
• Programming languages
• Operating Systems
• Computer Architecture
CSE244 Compilers
4
Purpose
• Simple intent
– Translate
• From
Source Language
• To
Target Language
CSE244 Compilers
5
Translate… Why?
• Languages offer
– Abstractions
– At different levels
• From low
– Good for machines….
• To
high
– Good for humans….
Let the computer
Do the heavy lifting.
CSE244 Compilers
6
Translate… How ?
• Three approaches
– Interpreted
– Compiled
– Mixed
CSE244 Compilers
7
Interpreter
• Motivation…
– Easiest to implement!
• Upside ?
• Downside ?
• Phases
– Lexical analysis
– Parsing
– Semantic checking
– Interpretation
CSE244 Compilers
8
Compiler
• Motivation
– It is natural!
• Upside?
• Downside?
• Phases
– [Interpreter]
– Code Generation
– Code Optimization
– Link & load
CSE244 Compilers
9
Mixed
• Motivation
– The best of two breeds…
• Upside ?
• Downside?
CSE244 Compilers
10
Objectives
• Learn about compilers because…
– It helps to better program
– Understand the tools of the trade
– Many languages are compiled
•
•
•
•
•
•
Programming Languages
Communication Languages
Presentation Languages
Hardware Languages
Formatting Languages
Query Languages
[C,C#,ML,LISP,…]
[XML,HTML,…]
[CSS,SGML,…]
[VHDL,…]
[Postscript,troff,LaTeX,…]
[SQL & friends]
– Many assistive tools use compiler technology…
• Syntax highlighting
• Type assist - type completion
– You may write/use compiler technology!
CSE244 Compilers
11
Overview
• Objectives
• Structure of the course
• Evaluation
• Compiler Introduction
– A compiler bird’s eye view
• Lexical analysis
• Parsing
• Semantic analysis
• Code generation
• Optimization
CSE244 Compilers
12
Course structure
• A reflection of the topic
– Lexical analysis
– Parsing
– Semantic analysis
– Midterm
– Runtime structures
– Intermediate code generation
– Machine code generation
– Optimization
– Final
CSE244 Compilers
13
Evaluation
• Course evaluation
– Six homeworks (50%)
• 5 mandatory
• 1 extra credit
– One midterm (20%)
– One final (30%)
• Exams
– Open book
CSE244 Compilers
14
Six Homeworks
• One project to rule them all...
And in the Darkness Bind You...
• Purpose
– First Hand Experience with compiler technology
• Six Homeworks are connected
– Scanning
– Parsing
– Analysis
– IR Code Generation
– IR Optimization
– Machine Code Generation
CSE244 Compilers
15
The Source Language
C++
C
Java
CSE244 Compilers
C--
16
C-• The Object Oriented Language For Dummies
– C-- supports
• Classes
• Inheritance
• Polymorphism
• 2 basic types
• Arrays
• Simple control primitives
– if-then-else
– while
CSE244 Compilers
17
C-- Example
class Foo {
int fact(int n) {
return 0;
}
int fib(int x) {
return 1;
}
};
class Main extends Foo {
Main() {
int x;
x = fact(5);
}
int fact(int n) {
if (n==0)
return 1;
else
return n * fact(n-1);
}
};
CSE244 Compilers
18
The Target Language
• Something realistic...
• Something challenging...
• Something useful...
It’s True!
We will generate code for a Pentium Class
Machine Running Eithter
Windows or Linux
CSE244 Compilers
19
But......
• Won’t that be hard?
• No!
– My C-- compiler
• 6000 lines of code
• Written in < 10 days
– Your C-- compiler
• Will use some of my code...
CSE244 Compilers
20
Really ?
global main
extern printf
extern malloc
section .data
D_0:
dd block_0,block_1
D_1:
dd block_3,block_1,block_2
section .textmain:
push 4
; push constant
call malloc
; call to C function
add esp,4
; pop argument
mov [eax], dword D_1
; move source into dest
push eax
; push variable
mov eax, dword [eax]
; get the VTBL
mov eax, dword [eax+8]
; get the k=2 method
call eax
; call the method
add esp,4
; pop args
ret
block_0:
mov [esp-4], dword ebp
; save old BP
mov ebp, dword esp
; set BP to SP
mov [ebp-8], dword esp
; save old SP
sub esp,8
; reserve space
mov eax, dword 0
; write result in output register
mov esp, dword [ebp-8]
; restore old SP
mov ebp, dword [ebp-4]
; restore old BP
ret
; return from functionblock_1:
mov [esp-4], dword ebp
; save old BP
mov ebp, dword esp
; set BP to SP
mov [ebp-8], dword esp
; save old SP
sub esp,8
; reserve space for
mov eax, dword 1
; write result in output register
mov esp, dword [ebp-8]
; restore old SP
mov ebp, dword [ebp-4]
; restore old BP
ret
; return from function
block_3:
mov [esp-4], dword ebp
; save old BP
mov ebp, dword esp
; set BP to SP
mov [ebp-8], dword esp
; save old SP
sub esp,20
; reserve space
CSE244 mov
Compilers
eax, dword [ebp+8]
; move o2
cmp eax,0
mov eax,0
sete ah
cmp eax,0
jz block_5
; do the operation
; compare to 0 to set CC
; transfer to true block
block_4:
mov eax, dword 1
jmp block_6
dword [ebp+8] ; move o2
sub eax,1
push eax
mov ebx, dword [ebp+4]
ebx
; push variable
mov ebx, dword [ebx]
mov ebx, dword [ebx]
call ebx
add esp,8
mov ecx, dword [ebp+8]
imul ecx,eax
mov eax, dword ecx
block_6:
mov esp, dword [ebp-8]
mov ebp, dword [ebp-4]
ret
block_2:
mov [esp-4], dword ebp
mov ebp, dword esp
mov [ebp-8], dword esp
sub esp,12
push 5
mov eax, dword [ebp+4]
eax
; push variable
mov eax, dword [eax]
mov eax, dword [eax]
call eax
add esp,8
mov esp, dword [ebp-8]
mov ebp, dword [ebp-4]
ret
; write result in output register
; transfer controlblock_5:
mov eax,
; do the operation
; push variable
; get argument in register
push
; get the VTBL
; get the k=0 method
; call the method
; pop args
; move o2
; do the operation
; write result in output register
; restore old SP
; restore old BP
; return from function
; save old BP
; set BP to SP
; save old SP
; reserve space
; push constant
; get argument in register
push
This is NASM assembly
generated for the C-example shown earlier.
; get the VTBL
; get the k=0 method
; call the method
; pop args
; restore old SP
; restore old BP
; return from function
21
Overview
• Objectives
• Structure of the course
• Evaluation
• Compiler Introduction
– A compiler bird’s eye view
• Lexical analysis
• Parsing
• Semantic analysis
• Code generation
• Optimization
CSE244 Compilers
22
Compiler Classes
• Compilers Viewed from Many Perspectives
Single Pass
Multiple Pass
Load & Go
Debugging
Optimizing
Construction
Functional
• However, All utilize same basic tasks to accomplish
their actions
CSE244 Compilers
23
Compiler Structure
• Two fundamental sets of issues
Analysis
Text analysis
Syntactic analysis
Structural analysis
Synthesis
Program generation
Program optimization
• Our Focus:
– Both
CSE244 Compilers
24
Some Good news!
• Tools do exist for
– Lexical and syntactic analysis
– Note: it was not always the case.
• Structure / Syntax directed editors:
– Force “syntactically” correct code to be entered
• Pretty Printers:
– Standardized version for program structure (i.e.,indenting)
• Static Checkers:
– A “quick” compilation to detect rudimentary errors
• Interpreters:
– “real” time execution of code a “line-at-a-time”
CSE244 Compilers
25
Phases of compilation
Source Program
1
2
3
Symbol-table
Manager
Syntax Analyzer
Semantic Analyzer
Error Handler
4
5
1, 2, 3 : Analysis
4, 5, 6 : Synthesis
Lexical Analyzer
6
Intermediate Code
Generator
Code Optimizer
Code Generator
Target Program
CSE244 Compilers
26
Relocatable
Source Program
1
2
3
4
5
Pre-Processor
Compiler
Assembler
Relocatable
Machine Code
Loader Link/Editor
Library,
relocatable object
files
Executable
CSE244 Compilers
27
Analysis
Source Program
Language Analysis Phases
1
2
3
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Error Handler
4
5
6
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
CSE244 Compilers
28
Lexical analysis
• Purpose
– Slice the sequence of symbols into tokens
Date x := new Date ( System.today( ) + 30 ) ;
Id
Date
CSE244 Compilers
Date Id System
Symbol (
Symbol +
Symbol )
Id x
Keyword new
Symbol .
Symbol )
Integer 30
Symbol ;
Symbol :=
Symbol (
Id Today
Id
29
Syntax Analysis (parsing)
• Purpose
– Organize tokens in sentences based on grammar
Symbol
)
Symbol
Symbol
.
)
new
Symbol (
Symbol :=
Symbol +
Symbol (
Symbol
;
Keyw ord
CSE244 Compilers
30
What is a grammar ?
• Grammar is a Set of Rules Which Govern
– The Interdependencies &
– Structure Among the Tokens
statement
assignment statement, or while
is an
statement, or if statement, or ...
assignment
statement
is an identifier := expression ;
expression
(expression), or expression +
expression, or expression *
is an
expression, or number, or
identifier, or ...
CSE244 Compilers
31
Summary so far…
• Turn a symbol stream into a parse tree
CSE244 Compilers
32
Semantic Analysis
• Purpose
– Determine Unique / Unambiguous Interpretation
– Catch errors related to program meaning
• Examples
– Natural Language
• “John Took Picture of Mary Out on the Patio”
– Programming Language
•
•
•
•
Wrong types
Missing declaration
Missing methods
Ambiguous statements…
CSE244 Compilers
33
Semantic Analysis
• Main task
– Type checking
– Many Different Situations
Real := int + char ;
A[int] := A[real] + int ;
while char <> int
do
…. Etc.
– Primary tool
• Symbol table
CSE244 Compilers
34
Summary so far...
• Turned a symbol stream into an annotated parse
tree
CSE244 Compilers
35
Analysis
Source Program
1
2
3
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Error Handler
Synthesis Phases
4
5
6
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
CSE244 Compilers
36
Intermediate Code
• What is intermediate code ?
– A low level representation
– A simple representation
– Easy to reason about
– Easy to manipulate
– Programming Language independent
– Hardware independent
CSE244 Compilers
37
IR Code example
• Three-address code
– Three operands per instruction
– Each assignment instruction has at most one operator
on the right side
– Instructions fix the order in which operations are to be
done
– Generate a temporal name to hold value computed
• Example
Position = initial + rate *60
id1
CSE244 Compilers
id2
id3
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
38
Code Optimization
• Purpose
– Improve the intermediate code
• Get rid of redundant / dead code
• Get rid of redundant computation
• Reorganize code
• Schedule instructions
• Factor out code
– Improve the machine code
• Register allocation
• Instruction scheduling [for specific hardware]
CSE244 Compilers
39
Machine Code Generation
• Purpose
– Generate code for the target architecture
– Code is relocatable
– Accounts for platforms specific
• Registers
– IA32: 6 GP registers (int)
– MIPS: 32 GP registers (int)
• Calling convention
– IA32: stack-based
– MIPS: register based
– Sparc: register sliding window based
CSE244 Compilers
40
Machine code generation
• Pragmatics
– Generate assembly
– Let an assembler produced the binary code.
CSE244 Compilers
41
Assemblers
• Assembly code: names are used for instructions, and
names are used for memory addresses.
MOV a, R1
ADD #2, R1
MOV R1, b
• Two-pass Assembly:
– First Pass: all identifiers are assigned to memory addresses (0offset)
– e.g. substitute 0 for a, and 4 for b
– Second Pass: produce relocatable machine code:
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
CSE244 Compilers
relocation
bit
42
Linker & Loader
• Loader
– Input
• Relocatable code in a file
– Output
• Bring executable into virtual address space. Relocate “shared”
libs.
• Linker
– Input
• Many relocatable machine code files (object files)
– Output
• A single executable file with all the object file glued together.
– Need to relocate each object file as needed
CSE244 Compilers
43
Pre-processors
• Purpose
– Macro processing
– Performs text replacement (editing)
– Performs file concatenation (#include)
• Example
– In C, #define does not exist
– In C, #define is handled by a preprocessor
#define X 3
#define Y A*B+C
#define Z getchar()
CSE244 Compilers
44
Modern Compilers
• Motto
– Re-targetable compilers
• Organization
– Front-end
• The analysis
• The generation of intermediate code
– Flexible IR
– Back-end
• The generation of machine code
• The architecture-specific optimization
CSE244 Compilers
45
Modern Compiler Example
• Gcc
– Front-ends for
• C,C++,FORTRAN,Objective-C,Java
– IR
• Static Single Assignment (SSA)
– Back-ends for
• IA32, PPC, Ultra, MIPS, ARM, Itanium, DEC, Alpha, VMS, …
CSE244 Compilers
46
Ahead
• Today’s lecture
– Reading
• Chapter 1
– For the inquisitive mind
• Chapter 2 is an overview of the entire process. Useful to
browse and refer to later on.
• Next Lecture
– Scanning
– Reading
• Chapter 3
CSE244 Compilers
47