Download Languages and Compiler

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Controlled grammar wikipedia , lookup

Lexical analysis wikipedia , lookup

Construction grammar wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Transformational grammar wikipedia , lookup

Context-free grammar wikipedia , lookup

Junction Grammar wikipedia , lookup

Parsing wikipedia , lookup

Transcript
Chapter 7
Introduction to
Languages and Compiler
Winter 2007
SEG2101 Chapter 7
1
Contents
•
•
•
•
•
•
•
Computer architecture
Compiler
Grammars
Formal languages
Parse trees
Ambiguity
Regular expressions
Winter 2007
SEG2101 Chapter 7
2
Von Neumann Architecture
Winter 2007
SEG2101 Chapter 7
3
Compiler
A compiler is a program that reads a program written in one
language – the source language – and translates it into an
equivalent program in another language – the target language.
Winter 2007
SEG2101 Chapter 7
4
The Compilation process
Winter 2007
SEG2101 Chapter 7
5
Grammars
• A grammar is defined as a 4-tuple: the
alphabet , the nonterminals N, the
production P, and a goal symbol S.
• (, N, P, S)
• , N, P are set, S is a particular element of
set N.
Winter 2007
SEG2101 Chapter 7
6
Alphabets and Strings
•  is the alphabet, or set of terminals.
• It is a finite set consisting of all the input
characters or symbols that can be arranged
to form sentences in the language.
• English: A to Z, in our definition, punctuation and
space symbols
• Programming language: usually some welldefined computer set such as ASCII
Winter 2007
SEG2101 Chapter 7
7
Alphabets and Strings (II)
• A compiler is usually defined with 2
grammars.
• The alphabet for the scanner grammar is
ASCII or some subset of it.
• The alphabet for the parse grammar is the
set of tokens generated by the scanner, not
ASCII at all.
Winter 2007
SEG2101 Chapter 7
8
An Example of Strings
• ={a,b,c,d}
• Possible strings of terminals from  include
aaa, aabbccdd, d, cba, abab,
ccccccccccacccc, and so on.
Winter 2007
SEG2101 Chapter 7
9
Formal Languages
• : alphabet, it is a finite set consisting of all
input characters or symbols.
• *: closure of the alphabet, the set of all
possible strings in , including the empty
string .
• A (formal) language is some specified
subset of *.
Winter 2007
SEG2101 Chapter 7
10
Nonterminals
• Nonterninal set N is a finite set of symbols not in
the alphabet.
• A particular nonterminal, the goal symbol S,
represents exactly all the strings in the language.
• The goal symbol is also often called the start
symbol because we start with it.
• The set of terminal and set of nonterminals, taken
together, is called vocabulary of the grammar.
Winter 2007
SEG2101 Chapter 7
11
Productions
• The productions P of a grammar is a set of
rewriting rules, each written as two strings
of symbols separated by an arrow.
• The symbols on each side of the arrow may
be drawn from both terminals and
nonterminals, subject to certain restrictions
in the form of the grammars.
Winter 2007
SEG2101 Chapter 7
12
An Example Grammar
• G1=({a,b,c}, {A,B}, {AaB, AbB,
AcB, B a, B b, B c}, A)
• The grammar generates 9 two-letter strings.
Winter 2007
SEG2101 Chapter 7
13
Syntax and Semantics
• Syntax: a syntax of a programming language is the form
of its expression, statements, and program units.
• Semantics: the meaning of those expression,
statements, and program units.
• If (<expr>) <statement>
Winter 2007
SEG2101 Chapter 7
14
Sentences, Lexeme, Token
• Sentences: the strings of a language are called
sentences or statements.
• Lexeme: the lexemes of a programming language
include its identifier, literals, operators, and special words.
• Token: a token of a language is a category of its
lexemes.
Winter 2007
SEG2101 Chapter 7
15
Lexeme and Token
Index = 2 * count +17;
Lexemes
Index
=
2
*
Count
+
17
;
Winter 2007
Tokens
Identifier
equal_sign
int_literal
multi_op
identifier
plus_op
int_literal
semicolon
SEG2101 Chapter 7
16
The Role of Grammars
• The grammar of a language defines the
correct form for sentences in that language.
• Grammar is the formal language generation
mechanism that are commonly used to
describe the syntax of programming
languages.
Winter 2007
SEG2101 Chapter 7
17
BNF: Backus-Naur Form
• Backus presented a new formal notation for
specifying programming language syntax.
• Naur modified the notation slightly.
• Known as Backus-Naur Form, or BNF.
• BNF is a very natural notation for
describing syntax.
• BNF and context-free grammar (grammar)
are used interchangeably.
Winter 2007
SEG2101 Chapter 7
18
BNF
• Metalanguage: A language used to describe another
language. BNF is a metalanguage for programming
language.
• Abstraction: the symbol on the left-hand of the arrow
• Definition: the text to the right of the arrow
• Rule (production): altogether the description is
called rule.
Winter 2007
SEG2101 Chapter 7
19
BNF Description
(A simple C assignment statement)
rule (production)
<assign>
<var> = <expression>
LHS
(Left Hand Side)
abstraction
RHS
(Right Hand Side)
definition
Winter 2007
SEG2101 Chapter 7
20
Nonterminal and Terminal
• Nonterminal symbol: the abstraction in a BNF description or
grammar
• Terminal symbol: the lexemes and tokens of the rules
• A BNF description or grammar is simply a
collection of rules.
• Nonterminals can have two or more distinct definitions.
• Multiple definitions can be written as a single rule, with the different
definitions separated by |, meaning logical OR.
<if_stmt>if <logic_expr>then<stmt>
|if <logic_expr>then<stmt>else<stmt>
Winter 2007
SEG2101 Chapter 7
21
List of Syntactic Elements
• BNF does not include ellipsis (…)
• BNF uses recursion
• A rule is recursive if its LHS appears in its
RHS.
• e.g., <ident_list>  identifier
| identifier , <ident_list>
Winter 2007
SEG2101 Chapter 7
22
A Grammar
Winter 2007
SEG2101 Chapter 7
23
A Derivation of a Program
Winter 2007
SEG2101 Chapter 7
24
Another Grammar
Winter 2007
SEG2101 Chapter 7
25
A Derivation of a Statement
Winter 2007
SEG2101 Chapter 7
26
Parse Tree
Grammars naturally describe
the hierarchical syntactic structure
of the sentences of the languages
they define.
These hierarchical structures are
called parse trees.
Winter 2007
SEG2101 Chapter 7
27
Ambiguous Grammar
• A grammar that generates a sentence for
which there are two or more distinct parse
trees is said to be ambiguous.
Winter 2007
SEG2101 Chapter 7
28
Ambiguity
Winter 2007
SEG2101 Chapter 7
29
Regular Expressions
Regular expression is a method of describing string.
Winter 2007
SEG2101 Chapter 7
30