* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CSE244 Compiler (a.k.a. Programming Language Translation)
Survey
Document related concepts
Library (computing) wikipedia , lookup
Indentation style wikipedia , lookup
Go (programming language) wikipedia , lookup
Structured programming wikipedia , lookup
Comment (computer programming) wikipedia , lookup
Coding theory wikipedia , lookup
C Sharp syntax wikipedia , lookup
GNU Compiler Collection wikipedia , lookup
Assembly language wikipedia , lookup
C Sharp (programming language) wikipedia , lookup
Program optimization wikipedia , lookup
Interpreter (computing) wikipedia , lookup
Transcript
CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to Me Laurent Michel Aggelos Kiayias Steven Demurjian Robert La Barre CSE CSE CSE CSE UTRC Overview • Objectives • Structure of the course • Evaluation • Compiler Introduction – A compiler bird’s eye view • Lexical analysis • Parsing (Syntax analysis) • Semantic analysis • Code generation • Optimization CSE244 Compilers 2 Information • Course web page – Will be on HuskyCT – Currently • http://www.engr.uconn.edu/~weiwei/ • Instructor – Wei Wei – [email protected] • Office hour – TuTh 3:30pm ~ 4:30pm – Wednesday 1:30pm ~ 4:30pm CSE244 Compilers 3 Objectives • Compilers are…. – Ubiquitous in Computer Science – Central to symbolic processing – Relates to • Theory of computing • Practice of computing • Programming languages • Operating Systems • Computer Architecture CSE244 Compilers 4 Purpose • Simple intent – Translate • From Source Language • To Target Language CSE244 Compilers 5 Translate… Why? • Languages offer – Abstractions – At different levels • From low – Good for machines…. • To high – Good for humans…. Let the computer Do the heavy lifting. CSE244 Compilers 6 Translate… How ? • Three approaches – Interpreted – Compiled – Mixed CSE244 Compilers 7 Interpreter • Motivation… – Easiest to implement! • Upside ? • Downside ? • Phases – Lexical analysis – Parsing – Semantic checking – Interpretation CSE244 Compilers 8 Compiler • Motivation – It is natural! • Upside? • Downside? • Phases – [Interpreter] – Code Generation – Code Optimization – Link & load CSE244 Compilers 9 Mixed • Motivation – The best of two breeds… • Upside ? • Downside? CSE244 Compilers 10 Objectives • Learn about compilers because… – It helps to better program – Understand the tools of the trade – Many languages are compiled • • • • • • Programming Languages Communication Languages Presentation Languages Hardware Languages Formatting Languages Query Languages [C,C#,ML,LISP,…] [XML,HTML,…] [CSS,SGML,…] [VHDL,…] [Postscript,troff,LaTeX,…] [SQL & friends] – Many assistive tools use compiler technology… • Syntax highlighting • Type assist - type completion – You may write/use compiler technology! CSE244 Compilers 11 Overview • Objectives • Structure of the course • Evaluation • Compiler Introduction – A compiler bird’s eye view • Lexical analysis • Parsing • Semantic analysis • Code generation • Optimization CSE244 Compilers 12 Course structure • A reflection of the topic – Lexical analysis – Parsing – Semantic analysis – Midterm – Runtime structures – Intermediate code generation – Machine code generation – Optimization – Final CSE244 Compilers 13 Evaluation • Course evaluation – Six homeworks (50%) • 5 mandatory • 1 extra credit – One midterm (20%) – One final (30%) • Exams – Open book CSE244 Compilers 14 Six Homeworks • One project to rule them all... And in the Darkness Bind You... • Purpose – First Hand Experience with compiler technology • Six Homeworks are connected – Scanning – Parsing – Analysis – IR Code Generation – IR Optimization – Machine Code Generation CSE244 Compilers 15 The Source Language C++ C Java CSE244 Compilers C-- 16 C-• The Object Oriented Language For Dummies – C-- supports • Classes • Inheritance • Polymorphism • 2 basic types • Arrays • Simple control primitives – if-then-else – while CSE244 Compilers 17 C-- Example class Foo { int fact(int n) { return 0; } int fib(int x) { return 1; } }; class Main extends Foo { Main() { int x; x = fact(5); } int fact(int n) { if (n==0) return 1; else return n * fact(n-1); } }; CSE244 Compilers 18 The Target Language • Something realistic... • Something challenging... • Something useful... It’s True! We will generate code for a Pentium Class Machine Running Eithter Windows or Linux CSE244 Compilers 19 But...... • Won’t that be hard? • No! – My C-- compiler • 6000 lines of code • Written in < 10 days – Your C-- compiler • Will use some of my code... CSE244 Compilers 20 Really ? global main extern printf extern malloc section .data D_0: dd block_0,block_1 D_1: dd block_3,block_1,block_2 section .textmain: push 4 ; push constant call malloc ; call to C function add esp,4 ; pop argument mov [eax], dword D_1 ; move source into dest push eax ; push variable mov eax, dword [eax] ; get the VTBL mov eax, dword [eax+8] ; get the k=2 method call eax ; call the method add esp,4 ; pop args ret block_0: mov [esp-4], dword ebp ; save old BP mov ebp, dword esp ; set BP to SP mov [ebp-8], dword esp ; save old SP sub esp,8 ; reserve space mov eax, dword 0 ; write result in output register mov esp, dword [ebp-8] ; restore old SP mov ebp, dword [ebp-4] ; restore old BP ret ; return from functionblock_1: mov [esp-4], dword ebp ; save old BP mov ebp, dword esp ; set BP to SP mov [ebp-8], dword esp ; save old SP sub esp,8 ; reserve space for mov eax, dword 1 ; write result in output register mov esp, dword [ebp-8] ; restore old SP mov ebp, dword [ebp-4] ; restore old BP ret ; return from function block_3: mov [esp-4], dword ebp ; save old BP mov ebp, dword esp ; set BP to SP mov [ebp-8], dword esp ; save old SP sub esp,20 ; reserve space CSE244 mov Compilers eax, dword [ebp+8] ; move o2 cmp eax,0 mov eax,0 sete ah cmp eax,0 jz block_5 ; do the operation ; compare to 0 to set CC ; transfer to true block block_4: mov eax, dword 1 jmp block_6 dword [ebp+8] ; move o2 sub eax,1 push eax mov ebx, dword [ebp+4] ebx ; push variable mov ebx, dword [ebx] mov ebx, dword [ebx] call ebx add esp,8 mov ecx, dword [ebp+8] imul ecx,eax mov eax, dword ecx block_6: mov esp, dword [ebp-8] mov ebp, dword [ebp-4] ret block_2: mov [esp-4], dword ebp mov ebp, dword esp mov [ebp-8], dword esp sub esp,12 push 5 mov eax, dword [ebp+4] eax ; push variable mov eax, dword [eax] mov eax, dword [eax] call eax add esp,8 mov esp, dword [ebp-8] mov ebp, dword [ebp-4] ret ; write result in output register ; transfer controlblock_5: mov eax, ; do the operation ; push variable ; get argument in register push ; get the VTBL ; get the k=0 method ; call the method ; pop args ; move o2 ; do the operation ; write result in output register ; restore old SP ; restore old BP ; return from function ; save old BP ; set BP to SP ; save old SP ; reserve space ; push constant ; get argument in register push This is NASM assembly generated for the C-example shown earlier. ; get the VTBL ; get the k=0 method ; call the method ; pop args ; restore old SP ; restore old BP ; return from function 21 Overview • Objectives • Structure of the course • Evaluation • Compiler Introduction – A compiler bird’s eye view • Lexical analysis • Parsing • Semantic analysis • Code generation • Optimization CSE244 Compilers 22 Compiler Classes • Compilers Viewed from Many Perspectives Single Pass Multiple Pass Load & Go Debugging Optimizing Construction Functional • However, All utilize same basic tasks to accomplish their actions CSE244 Compilers 23 Compiler Structure • Two fundamental sets of issues Analysis Text analysis Syntactic analysis Structural analysis Synthesis Program generation Program optimization • Our Focus: – Both CSE244 Compilers 24 Some Good news! • Tools do exist for – Lexical and syntactic analysis – Note: it was not always the case. • Structure / Syntax directed editors: – Force “syntactically” correct code to be entered • Pretty Printers: – Standardized version for program structure (i.e.,indenting) • Static Checkers: – A “quick” compilation to detect rudimentary errors • Interpreters: – “real” time execution of code a “line-at-a-time” CSE244 Compilers 25 Phases of compilation Source Program 1 2 3 Symbol-table Manager Syntax Analyzer Semantic Analyzer Error Handler 4 5 1, 2, 3 : Analysis 4, 5, 6 : Synthesis Lexical Analyzer 6 Intermediate Code Generator Code Optimizer Code Generator Target Program CSE244 Compilers 26 Relocatable Source Program 1 2 3 4 5 Pre-Processor Compiler Assembler Relocatable Machine Code Loader Link/Editor Library, relocatable object files Executable CSE244 Compilers 27 Analysis Source Program Language Analysis Phases 1 2 3 Lexical Analyzer Syntax Analyzer Semantic Analyzer Error Handler 4 5 6 Intermediate Code Generator Code Optimizer Code Generator Target Program CSE244 Compilers 28 Lexical analysis • Purpose – Slice the sequence of symbols into tokens Date x := new Date ( System.today( ) + 30 ) ; Id Date CSE244 Compilers Date Id System Symbol ( Symbol + Symbol ) Id x Keyword new Symbol . Symbol ) Integer 30 Symbol ; Symbol := Symbol ( Id Today Id 29 Syntax Analysis (parsing) • Purpose – Organize tokens in sentences based on grammar Symbol ) Symbol Symbol . ) new Symbol ( Symbol := Symbol + Symbol ( Symbol ; Keyw ord CSE244 Compilers 30 What is a grammar ? • Grammar is a Set of Rules Which Govern – The Interdependencies & – Structure Among the Tokens statement assignment statement, or while is an statement, or if statement, or ... assignment statement is an identifier := expression ; expression (expression), or expression + expression, or expression * is an expression, or number, or identifier, or ... CSE244 Compilers 31 Summary so far… • Turn a symbol stream into a parse tree CSE244 Compilers 32 Semantic Analysis • Purpose – Determine Unique / Unambiguous Interpretation – Catch errors related to program meaning • Examples – Natural Language • “John Took Picture of Mary Out on the Patio” – Programming Language • • • • Wrong types Missing declaration Missing methods Ambiguous statements… CSE244 Compilers 33 Semantic Analysis • Main task – Type checking – Many Different Situations Real := int + char ; A[int] := A[real] + int ; while char <> int do …. Etc. – Primary tool • Symbol table CSE244 Compilers 34 Summary so far... • Turned a symbol stream into an annotated parse tree CSE244 Compilers 35 Analysis Source Program 1 2 3 Lexical Analyzer Syntax Analyzer Semantic Analyzer Error Handler Synthesis Phases 4 5 6 Intermediate Code Generator Code Optimizer Code Generator Target Program CSE244 Compilers 36 Intermediate Code • What is intermediate code ? – A low level representation – A simple representation – Easy to reason about – Easy to manipulate – Programming Language independent – Hardware independent CSE244 Compilers 37 IR Code example • Three-address code – Three operands per instruction – Each assignment instruction has at most one operator on the right side – Instructions fix the order in which operations are to be done – Generate a temporal name to hold value computed • Example Position = initial + rate *60 id1 CSE244 Compilers id2 id3 t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 38 Code Optimization • Purpose – Improve the intermediate code • Get rid of redundant / dead code • Get rid of redundant computation • Reorganize code • Schedule instructions • Factor out code – Improve the machine code • Register allocation • Instruction scheduling [for specific hardware] CSE244 Compilers 39 Machine Code Generation • Purpose – Generate code for the target architecture – Code is relocatable – Accounts for platforms specific • Registers – IA32: 6 GP registers (int) – MIPS: 32 GP registers (int) • Calling convention – IA32: stack-based – MIPS: register based – Sparc: register sliding window based CSE244 Compilers 40 Machine code generation • Pragmatics – Generate assembly – Let an assembler produced the binary code. CSE244 Compilers 41 Assemblers • Assembly code: names are used for instructions, and names are used for memory addresses. MOV a, R1 ADD #2, R1 MOV R1, b • Two-pass Assembly: – First Pass: all identifiers are assigned to memory addresses (0offset) – e.g. substitute 0 for a, and 4 for b – Second Pass: produce relocatable machine code: 0001 01 00 00000000 * 0011 01 10 00000010 0010 01 00 00000100 * CSE244 Compilers relocation bit 42 Linker & Loader • Loader – Input • Relocatable code in a file – Output • Bring executable into virtual address space. Relocate “shared” libs. • Linker – Input • Many relocatable machine code files (object files) – Output • A single executable file with all the object file glued together. – Need to relocate each object file as needed CSE244 Compilers 43 Pre-processors • Purpose – Macro processing – Performs text replacement (editing) – Performs file concatenation (#include) • Example – In C, #define does not exist – In C, #define is handled by a preprocessor #define X 3 #define Y A*B+C #define Z getchar() CSE244 Compilers 44 Modern Compilers • Motto – Re-targetable compilers • Organization – Front-end • The analysis • The generation of intermediate code – Flexible IR – Back-end • The generation of machine code • The architecture-specific optimization CSE244 Compilers 45 Modern Compiler Example • Gcc – Front-ends for • C,C++,FORTRAN,Objective-C,Java – IR • Static Single Assignment (SSA) – Back-ends for • IA32, PPC, Ultra, MIPS, ARM, Itanium, DEC, Alpha, VMS, … CSE244 Compilers 46 Ahead • Today’s lecture – Reading • Chapter 1 – For the inquisitive mind • Chapter 2 is an overview of the entire process. Useful to browse and refer to later on. • Next Lecture – Scanning – Reading • Chapter 3 CSE244 Compilers 47