Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Introduction to Languages and Compiler Winter 2007 SEG2101 Chapter 7 1 Contents • • • • • • • Computer architecture Compiler Grammars Formal languages Parse trees Ambiguity Regular expressions Winter 2007 SEG2101 Chapter 7 2 Von Neumann Architecture Winter 2007 SEG2101 Chapter 7 3 Compiler A compiler is a program that reads a program written in one language – the source language – and translates it into an equivalent program in another language – the target language. Winter 2007 SEG2101 Chapter 7 4 The Compilation process Winter 2007 SEG2101 Chapter 7 5 Grammars • A grammar is defined as a 4-tuple: the alphabet , the nonterminals N, the production P, and a goal symbol S. • (, N, P, S) • , N, P are set, S is a particular element of set N. Winter 2007 SEG2101 Chapter 7 6 Alphabets and Strings • is the alphabet, or set of terminals. • It is a finite set consisting of all the input characters or symbols that can be arranged to form sentences in the language. • English: A to Z, in our definition, punctuation and space symbols • Programming language: usually some welldefined computer set such as ASCII Winter 2007 SEG2101 Chapter 7 7 Alphabets and Strings (II) • A compiler is usually defined with 2 grammars. • The alphabet for the scanner grammar is ASCII or some subset of it. • The alphabet for the parse grammar is the set of tokens generated by the scanner, not ASCII at all. Winter 2007 SEG2101 Chapter 7 8 An Example of Strings • ={a,b,c,d} • Possible strings of terminals from include aaa, aabbccdd, d, cba, abab, ccccccccccacccc, and so on. Winter 2007 SEG2101 Chapter 7 9 Formal Languages • : alphabet, it is a finite set consisting of all input characters or symbols. • *: closure of the alphabet, the set of all possible strings in , including the empty string . • A (formal) language is some specified subset of *. Winter 2007 SEG2101 Chapter 7 10 Nonterminals • Nonterninal set N is a finite set of symbols not in the alphabet. • A particular nonterminal, the goal symbol S, represents exactly all the strings in the language. • The goal symbol is also often called the start symbol because we start with it. • The set of terminal and set of nonterminals, taken together, is called vocabulary of the grammar. Winter 2007 SEG2101 Chapter 7 11 Productions • The productions P of a grammar is a set of rewriting rules, each written as two strings of symbols separated by an arrow. • The symbols on each side of the arrow may be drawn from both terminals and nonterminals, subject to certain restrictions in the form of the grammars. Winter 2007 SEG2101 Chapter 7 12 An Example Grammar • G1=({a,b,c}, {A,B}, {AaB, AbB, AcB, B a, B b, B c}, A) • The grammar generates 9 two-letter strings. Winter 2007 SEG2101 Chapter 7 13 Syntax and Semantics • Syntax: a syntax of a programming language is the form of its expression, statements, and program units. • Semantics: the meaning of those expression, statements, and program units. • If (<expr>) <statement> Winter 2007 SEG2101 Chapter 7 14 Sentences, Lexeme, Token • Sentences: the strings of a language are called sentences or statements. • Lexeme: the lexemes of a programming language include its identifier, literals, operators, and special words. • Token: a token of a language is a category of its lexemes. Winter 2007 SEG2101 Chapter 7 15 Lexeme and Token Index = 2 * count +17; Lexemes Index = 2 * Count + 17 ; Winter 2007 Tokens Identifier equal_sign int_literal multi_op identifier plus_op int_literal semicolon SEG2101 Chapter 7 16 The Role of Grammars • The grammar of a language defines the correct form for sentences in that language. • Grammar is the formal language generation mechanism that are commonly used to describe the syntax of programming languages. Winter 2007 SEG2101 Chapter 7 17 BNF: Backus-Naur Form • Backus presented a new formal notation for specifying programming language syntax. • Naur modified the notation slightly. • Known as Backus-Naur Form, or BNF. • BNF is a very natural notation for describing syntax. • BNF and context-free grammar (grammar) are used interchangeably. Winter 2007 SEG2101 Chapter 7 18 BNF • Metalanguage: A language used to describe another language. BNF is a metalanguage for programming language. • Abstraction: the symbol on the left-hand of the arrow • Definition: the text to the right of the arrow • Rule (production): altogether the description is called rule. Winter 2007 SEG2101 Chapter 7 19 BNF Description (A simple C assignment statement) rule (production) <assign> <var> = <expression> LHS (Left Hand Side) abstraction RHS (Right Hand Side) definition Winter 2007 SEG2101 Chapter 7 20 Nonterminal and Terminal • Nonterminal symbol: the abstraction in a BNF description or grammar • Terminal symbol: the lexemes and tokens of the rules • A BNF description or grammar is simply a collection of rules. • Nonterminals can have two or more distinct definitions. • Multiple definitions can be written as a single rule, with the different definitions separated by |, meaning logical OR. <if_stmt>if <logic_expr>then<stmt> |if <logic_expr>then<stmt>else<stmt> Winter 2007 SEG2101 Chapter 7 21 List of Syntactic Elements • BNF does not include ellipsis (…) • BNF uses recursion • A rule is recursive if its LHS appears in its RHS. • e.g., <ident_list> identifier | identifier , <ident_list> Winter 2007 SEG2101 Chapter 7 22 A Grammar Winter 2007 SEG2101 Chapter 7 23 A Derivation of a Program Winter 2007 SEG2101 Chapter 7 24 Another Grammar Winter 2007 SEG2101 Chapter 7 25 A Derivation of a Statement Winter 2007 SEG2101 Chapter 7 26 Parse Tree Grammars naturally describe the hierarchical syntactic structure of the sentences of the languages they define. These hierarchical structures are called parse trees. Winter 2007 SEG2101 Chapter 7 27 Ambiguous Grammar • A grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous. Winter 2007 SEG2101 Chapter 7 28 Ambiguity Winter 2007 SEG2101 Chapter 7 29 Regular Expressions Regular expression is a method of describing string. Winter 2007 SEG2101 Chapter 7 30