* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Join-pattern wikipedia , lookup
Falcon (programming language) wikipedia , lookup
Scala (programming language) wikipedia , lookup
Java (programming language) wikipedia , lookup
Pascal (programming language) wikipedia , lookup
Assembly language wikipedia , lookup
C Sharp syntax wikipedia , lookup
Go (programming language) wikipedia , lookup
Ada (programming language) wikipedia , lookup
Java ConcurrentMap wikipedia , lookup
Program optimization wikipedia , lookup
Java performance wikipedia , lookup
Interpreter (computing) wikipedia , lookup
C Sharp (programming language) wikipedia , lookup
Transcript
Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson whose slides this lecture is based on. 1 What is this course about? • Programming Language Design – Concepts and Paradigms – Ideas and philosophy – Syntax and Semantics • Compiler Construction – Tools and Techniques – Implementations – The nuts and bolts 2 Tools and Techniques • Front-end: Syntax analysis – How to build a Scanner and Lexer • By hand in Java • Using Tools – JavaCC – SableCC – Lex and Yacc (JLex and JavaCUP) – (lg and pg – compiler tools for .Net) • Middle-part: Contextual Analysis • Back-end: Code Generation – Target Machines • TAM • JVM • (.Net CLR) 3 Today’s lecture • Two topics – Treating Compilers and Interpreters as black-boxes • Tombstone- or T- diagrams – A first look inside the black-box • Your guided tour 4 Terminology Q: Which programming languages play a role in this picture? input source program Translator is expressed in the source language output object program is expressed in the target language is expressed in the implementation language A: All of them! 5 Tombstone Diagrams What are they? – diagrams consisting out of a set of “puzzle pieces” we can use to reason about language processors and programs – different kinds of pieces – combination rules (not all diagrams are “well formed”) Program P implemented in L P L Machine implemented in hardware M Translator implemented in L S -> T L Language interpreter in L M L 6 Tombstone diagrams: Combination rules P M M P L M OK! P S P T S -> T M OK! M OK! OK! WRONG! P L WRONG! S -> T M 7 Compilation Example: Compilation of C programs on an x86 machine Tetris C C -> x86 x86 x86 Tetris x86 Tetris x86 x86 8 What is Tetris? Tetris® The World's Most Popular Video Game Since its commercial introduction in 1987, Tetris® has been established as the largest selling and most recognized global brand in the history of the interactive game software industry. Simple, entertaining, and yet challenging, Tetris® can be found on more than 60 platforms. Over 65 million Tetris® units have been sold worldwide to date. 9 Cross compilation Example: A C “cross compiler” from x86 to PPC A cross compiler is a compiler which runs on one machine (the host machine) but emits code for another machine (the target machine). Tetris C C -> PPC x86 x86 Tetris PPC download Tetris PPC PPC Host ≠ Target Q: Are cross compilers useful? Why would/could we use them? 10 Two Stage Compilation A two-stage translator is a composition of two translators. The output of the first translator is provided as input to the second translator. Tetris Tetris Tetris Java Java->JVM JVM JVM->x86 x86 x86 x86 x86 x86 11 Compiling a Compiler Observation: A compiler is a program! Therefore it can be provided as input to a language processor. Example: compiling a compiler. Java->x86 Java->x86 C -> x86 x86 C x86 x86 12 Interpreters An interpreter is a language processor implemented in software, i.e. as a program. Terminology: abstract (or virtual) machine versus real machine Example: The Java Virtual Machine Tetris JVM JVM x86 x86 Q: Why are abstract machines useful? 13 Interpreters Q: Why are abstract machines useful? 1) Abstract machines provide better platform independence Tetris JVM JVM x86 x86 Tetris JVM JVM PPC PPC 14 Interpreters Q: Why are abstract machines useful? 2) Abstract machines are useful for testing and debugging. Example: Testing the “Ultima” processor using hardware emulation P Ultima Ultima x86 x86 P Ultima Ultima Functional equivalence Note: we don’t have to implement Ultima emulator in x86 we can use a high-level language and compile it. 15 Interpreters versus Compilers Q: What are the tradeoffs between compilation and interpretation? Compilers typically offer more advantages when – programs are deployed in a production setting – programs are “repetitive” – the instructions of the programming language are complex Interpreters typically are a better choice when – – – – we are in a development/testing/debugging stage programs are run once and then discarded the instructions of the language are simple the execution speed is overshadowed by other factors • e.g. on a web server where communications costs are much higher than execution speed 16 Interpretive Compilers Why? A tradeoff between fast(er) compilation and a reasonable runtime performance. How? Use an “intermediate language” • more high-level than machine code => easier to compile to • more low-level than source language => easy to implement as an interpreter Example: A “Java Development Kit” for machine M Java->JVM M JVM M 17 Interpretive Compilers Example: Here is how we use our “Java Development Kit” to run a Java program P P Java javac P Java->JVM JVM M M java P JVM JVM M M 18 Portable Compilers Example: Two different “Java Development Kits” Kit 1: Java->JVM M JVM M Java->JVM JVM JVM M Kit 2: Q: Which one is “more portable”? 19 Portable Compilers In the previous example we have seen that portability is not an “all or nothing” kind of deal. It is useful to talk about a “degree of portability” as the percentage of code that needs to be re-written when moving to a dissimilar machine. In practice 100% portability is as good as impossible. 20 Example: a “portable” compiler kit Portable Compiler Kit: Java->JVM Java Java->JVM JVM JVM Java Q: Suppose we want to run this kit on some machine M. How could we go about realizing that goal? (with the least amount of effort) 21 Example: a “portable” compiler kit Java->JVM Java Java->JVM JVM JVM Java Q: Suppose we want to run this kit on some machine M. How could we go about realizing that goal? (with the least amount of effort) JVM Java reimplement JVM C C->M M M JVM M 22 Example: a “portable” compiler kit This is what we have now: Java->JVM Java Java->JVM JVM JVM Java JVM M Now, how do we run our Tetris program? Tetris Tetris Java Java->JVM JVM JVM JVM M M Tetris JVM JVM M M 23 Bootstrapping Remember our “portable compiler kit”: Java->JVM Java Java->JVM JVM JVM Java JVM M We haven’t used this yet! Java->JVM Java Same language! Q: What can we do with a compiler written in itself? Is that useful at all? 24 Bootstrapping Java->JVM Java Same language! Q: What can we do with a compiler written in itself? Is that useful at all? • By implementing the compiler in (a subset of) its own language, we become less dependent on the target platform => more portable implementation. • But… “chicken and egg problem”? How do to get around that? => BOOTSTRAPPING: requires some work to make the first “egg”. There are many possible variations on how to bootstrap a compiler written in its own language. 25 Bootstrapping an Interpretive Compiler to Generate M code Our “portable compiler kit”: Java->JVM Java Java->JVM JVM JVM Java JVM M Goal we want to get a “completely native” Java compiler on machine M P P Java->M Java M M M 26 Bootstrapping an Interpretive Compiler to Generate M code Idea: we will build a two-stage Java -> M compiler. P Java P Java->JVM JVM M M M We will make this by compiling Java->JVM JVM JVM->M M M P M To get this we implement JVM->M Java and compile it 27 Bootstrapping an Interpretive Compiler to Generate M code Step 1: implement JVM->M Java Step 2: compile it JVM->M JVM->M Java Java->JVM JVM JVM JVM M M Step 3: compile this 28 Bootstrapping an Interpretive Compiler to Generate M code Step 3: “Self compile” the JVM (in JVM) compiler JVM->M JVM->M JVM JVM->M M JVM JVM M M This is the second stage of our compiler! Step 4: use this to compile the Java compiler 29 Bootstrapping an Interpretive Compiler to Generate M code Step 4: Compile the Java->JVM compiler into machine code Java->JVM Java->JVM JVM JVM->M M M M The first stage of our compiler! We are DONE! 30 Full Bootstrap A full bootstrap is necessary when we are building a new compiler from scratch. Example: We want to implement an Ada compiler for machine M. We don’t currently have access to any Ada compiler (not on M, nor on any other machine). Idea: Ada is very large, we will implement the compiler in a subset of Ada and bootstrap it from a subset of Ada compiler in another language. (e.g. C) v1 Step 1: build a compiler for Ada-S Ada-S ->M in another language C 31 Full Bootstrap Step 1a: build a compiler (v1) for Ada-S in another language. v1 Ada-S ->M C Step 1b: Compile v1 compiler on M v1 v1 Ada-S ->M Ada-S->M C->M C M M This compiler can be used for M bootstrapping on machine M but we do not want to rely on it permanently! 32 Full Bootstrap Step 2a: Implement v2 of Ada-S compiler in Ada-S v2 Ada-S ->M Q: Is it hard to rewrite the compiler in Ada-S? Ada-S Step 2b: Compile v2 compiler with v1 compiler v2 v2 v1 Ada-S->M Ada-S ->M M Ada-S Ada-S ->M M We are now no longer dependent M on the availability of a C compiler! 33 Full Bootstrap Step 3a: Build a full Ada compiler in Ada-S v3 Ada->M Ada-S Step 3b: Compile with v2 compiler v3 v3 v2 Ada->M Ada->M M Ada-S Ada-S ->M M M From this point on we can maintain the compiler in Ada. Subsequent versions v4,v5,... of the compiler in Ada and compile each with the the previous version. 34 Half Bootstrap We discussed full bootstrap which is required when we have no access to a compiler for our language at all. Q: What if we have access to an compiler for our language on a different machine HM but want to develop one for TM ? We have: Ada->HM HM We want: Ada->HM Ada Ada->TM TM Idea: We can use cross compilation from HM to TM to bootstrap the TM compiler. 35 Half Bootstrap Idea: We can use cross compilation from HM to M to bootstrap the M compiler. Step 1: Implement Ada->TM compiler in Ada Ada->TM Ada Step 2: Compile on HM Ada->TM Ada->TM Ada Ada->HM HM HM HM Cross compiler: running on HM but emits TM code 36 Half Bootstrap Step 3: Cross compile our TM compiler. Ada->TM Ada Ada->TM Ada->TM HM HM DONE! TM From now on we can develop subsequent versions of the compiler completely on TM 37 Bootstrapping to Improve Efficiency The efficiency of programs and compilers: Efficiency of programs: - memory usage - runtime Efficiency of compilers: - Efficiency of the compiler itself - Efficiency of the emitted code Idea: We start from a simple compiler (generating inefficient code) and develop more sophisticated version of it. We can then use bootstrapping to improve performance of the compiler. 38 Bootstrapping to Improve Efficiency We have: Step 1 Ada->Mslow Ada Ada-> Mslow Mslow We implement: Ada->Mfast Ada Ada->Mfast Ada->Mfast Ada Ada-> Mslow Mslow Mslow M Step 2 Ada->Mfast Ada->Mfast Ada Ada-> Mfast Mfast Mslow Fast compiler that emits fast code! M 39 Conclusion • • • To write a good compiler you may be writing several simpler ones first You have to think about the source language, the target language and the implementation language. Strategies for implementing a compiler 1. Write it in machine code 2. Write it in a lower level language and compile it using an existing compiler 3. Write it in the same language that it compiles and bootstrap • The work of a compiler writer is never finished, there is always version 1.x and version 2.0 and … 40 Compilation So far we have treated language processors (including compilers) as “black boxes” Now we take a first look "inside the box": how are compilers built. And we take a look at the different “phases” and their relationships 41 The “Phases” of a Compiler Source Program Syntax Analysis Error Reports Abstract Syntax Tree Contextual Analysis Error Reports Decorated Abstract Syntax Tree Code Generation Object Code 42 Different Phases of a Compiler The different phases can be seen as different transformation steps to transform source code into object code. The different phases correspond roughly to the different parts of the language specification: • Syntax analysis <-> Syntax • Contextual analysis <-> Contextual constraints • Code generation <-> Semantics 43 Example Program We now look at each of the three different phases in a little more detail. We look at each of the steps in transforming an example Triangle program into TAM code. ! This program is useless except for ! illustration let var n: integer; var c: char in begin c := ‘&’; n := n+1 end 44 1) Syntax Analysis Source Program Syntax Analysis Error Reports Abstract Syntax Tree Note: Not all compilers construct an explicit representation of an AST. (e.g. on a “single pass compiler” generally no need to construct an AST) 45 1) Syntax Analysis -> AST Program LetCommand SequentialCommand SequentialDeclaration AssignCommand AssignCommand VarDecl VarDecl SimpleT Ident n Ident Integer Char.Expr BinaryExpr VNameExp Int.Expr SimpleT SimpleV Ident Ident c Char SimpleV Ident Char.Lit Ident c ‘&’ n Ident Op Int.Lit n + 1 46 2) Contextual Analysis -> Decorated AST Abstract Syntax Tree Contextual Analysis Error Reports Decorated Abstract Syntax Tree Contextual analysis: • Scope checking: verify that all applied occurrences of identifiers are declared • Type checking: verify that all operations in the program are used according to their type rules. Annotate AST: • Applied identifier occurrences => declaration • Expressions => Type 47 2) Contextual Analysis -> Decorated AST Program LetCommand SequentialCommand SequentialDeclaration VarDecl Ident n Integer AssignCommand BinaryExpr :int Char.Expr VNameExp Int.Expr VarDecl SimpleT Ident AssignCommand :char :int SimpleT SimpleV SimpleV :char Ident Ident c Char :int Ident Char.Lit Ident c ‘&’ :int n Ident Op Int.Lit n + 1 48 Contextual Analysis Finds scope and type errors. Example 1: AssignCommand ***TYPE ERROR (incompatible types in :int assigncommand) :char Example 2: foo not found SimpleV ***SCOPE ERROR: undeclared variable foo Ident foo 49 3) Code Generation Decorated Abstract Syntax Tree Code Generation Object Code • Assumes that program has been thoroughly checked and is well formed (scope & type rules) • Takes into account semantics of the source language as well as the target language. • Transforms source program into target code. 50 3) Code Generation let var n: integer; var c: char in begin c := ‘&’; n := n+1 end VarDecl address = 0[SB] PUSH 2 LOADL 38 STORE 1[SB] LOAD 0 LOADL 1 CALL add STORE 0[SB] POP 2 HALT SimpleT Ident Ident n Integer 51 Compiler Passes • A pass is a complete traversal of the source program, or a complete traversal of some internal representation of the source program. • A pass can correspond to a “phase” but it does not have to! • Sometimes a single “pass” corresponds to several phases that are interleaved in time. • What and how many passes a compiler does over the source program is an important design decision. 52 Single Pass Compiler A single pass compiler makes a single pass over the source text, parsing, analyzing and generating code all at once. Dependency diagram of a typical Single Pass Compiler: Compiler Driver calls Syntactic Analyzer calls Contextual Analyzer calls Code Generator 53 Multi Pass Compiler A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. Dependency diagram of a typical Multi Pass Compiler: Compiler Driver calls calls calls Syntactic Analyzer Contextual Analyzer Code Generator input output input output input output Source Text AST Decorated AST Object Code 54 Example: The Triangle Compiler Driver public class Compiler { public static void compileProgram(...) { Parser parser = new Parser(...); Checker checker = new Checker(...); Encoder generator = new Encoder(...); Program theAST = parser.parse(); checker.check(theAST); generator.encode(theAST); } } public void main(String[] args) { ... compileProgram(...) ... } 55 Compiler Design Issues Single Pass Multi Pass Speed better worse Memory Modularity better for large programs worse (potentially) better for small programs better Flexibility worse better “Global” optimization impossible possible Source Language single pass compilers are not possible for many programming languages 56 Language Issues Example Pascal: Pascal was explicitly designed to be easy to implement with a single pass compiler: – Every identifier must be declared before it is first use. ? var n:integer; procedure inc; begin n:=n+1 end procedure inc; begin n:=n+1 end; Undeclared Variable! var n:integer; 57 Language Issues Example Pascal: – Every identifier must be declared before it is used. – How to handle mutual recursion then? procedure ping(x:integer) begin ... pong(x-1); ... end; procedure pong(x:integer) begin ... ping(x); ... end; 58 Language Issues Example Pascal: – Every identifier must be declared before it is used. – How to handle mutual recursion then? forward procedure pong(x:integer) procedure ping(x:integer) begin ... pong(x-1); ... end; OK! procedure pong(x:integer) begin ... ping(x); ... end; 59 Language Issues Example Java: – identifiers can be declared before they are used. – thus a Java compiler need at least two passes Class Example { void inc() { n = n + 1; } int n; void use() { n = 0 ; inc(); } } 60 Keep in mind There are many issues influencing the design of a new programming language: – Choice of paradigm – Syntactic preferences – Even the compiler implementation • e.g no of passes • available tools There are many issues influencing the design of new compiler: – No of passes – The source, target and implementation language – Available tools 61