Download grammars for expressions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ambiguity wikipedia , lookup

Addition wikipedia , lookup

Location arithmetic wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Arithmetic wikipedia , lookup

Positional notation wikipedia , lookup

Elementary mathematics wikipedia , lookup

Elementary arithmetic wikipedia , lookup

Transcript
LEXICAL SYNTAX
 Grammars deals with units called tokens.
 Tokens – sequence of characters having a collective meaning.
Tokens and Spellings
 The syntax of a programming language is specified in terms of units called tokens or terminals.
 A lexical syntax for a language specifies the correspondence between the written representation of
the language and the tokens or terminals in a grammar for the language.
 Alphabetic character sequences that are treated as units in a language are called keywords.
 Keywords are reserved words if they cannot be used as names.
Example: if and while are keywords in both Pascal and C.
 The actual character sequence used to write down an occurrence of a token is called the spelling of
that occurrence.
 Subscripts can be used to distinguish between occurrences of a token; the subscript might be the
spelling
For a token ‘number’ for integers, the character sequence
b * b – 4 * a *c
is represented by the token sequence
nameb * nameb = number4* namea * namec
 White space in the form of black, tab, and newline characters can typically be inserted between token
without changing the meaning of a program.
 Comments between tokens are ignored.
CONTEXT – FREE GRAMMARS
 The concrete syntax of a language describes its written representation, including lexical details such
as placement of keywords and punctuation marks.
 Context - free grammars or simply grammars, are a notation for specifying concrete syntax BNF
(Backus – Naur Form) is a way of writing grammars.
Introduction to Grammars
 A grammar for a language imposes a hierarchical structure, called a parse tree on programs in the
language.
Example: parse tree for string 3.14
real – number
integer – part
fraction
digit
3
.
digit
fraction
1
digit
4
 The leaves at the bottom of a parse tree are labeled with terminals or tokens.
 The other nodes of a parse tree are labeled with non – terminals. Each node in the parse tree is based
on a production, a rule that defines a nonterminal in terms of a sequence of terminals and non
terminals.
Definition of Context – Free Grammars
A context – free grammar or simply grammar, has four parts:
1) A set of tokens or terminals; these are the atomic symbols in the language.
2) A set of nonterminals; these are the variables representing constructs in the language.
3) A set of rules called productions for identifying the components of a construct. Each
production has a nonterminal as its left side, the symbol ::=, and a string over the sets of
terminals and nonterminals as its right side.
4) A nonterminal chosen as the starting nonterminal; it represents the main construct of the
language.
BNF (Backus – Naur Form)
 BNF is one notation used to write grammars.
Terminals and Nonterminals
 In BNF, nonterminals are enclosed between the special symbols < and >, and the empty string is
written as <empty>.
 Terminals consisting of symbols like + and * usually appear as is, but they can be quoted for emphasis.
Productions
 Read the symbol ::= as “can be” and read the symbol | as “or”.
BNF Rules for Real Numbers
<real – number>::=<integer – part> . <fraction>
In words, a real number has an integer part, a decimal point, and a fractional part.
<integer – part>::=<digit>|<integer – part><digit>|<empty>
The integer part can be a digit, or it can be a integer – part followed by a digit or it can be an
empty string.
<fraction>::=<digit>|<digit><fraction>
A fraction part can be a digit, or it can be digit followed by a fractional part.
<digit>::=0|1|2|3|4|5|6|7|8|9
The variable <empty> represents an empty string of length 0. It is useful for specifying
optional constructs.
In the above example
Non – terminals - <real - number>, <integer – part>, <fraction>, and <digit>.
The token are the digit 0,1,. . . ,9 and the decimal point.
Example:
<identifier>::=<letter>|<letter><digit>
<letter>::=a|b|c . . . |z|A|B|C . . .|Z
<digit>::=0|1|2|3|4|5|6|7|8|9
Parse Trees
 The productions in a grammar are rules for building strings of tokens.
 A parse tree shows how a string can be built.
 A parse tree with respect to a grammar is a tree satisfying the following:
1) Each leaf is labeled with a terminal or <empty>, representing the empty string.
2) Each nonleaf node is labeled with a nonterminal.
3) The label of a nonleaf node is the left side of some productions and the labels of the children
of the node, from left to right, form the right side of that production.
4) The root is labeled with the starting nonterminal.
 A parse tree generates the string formed by reading the terminals at its leaves from left to right. A
string is in a language if and only if it is generated by some parse tree.
 The construction of a parse tree is called parsing.
Example:
real – number
integer –part
integer – part
integer – part
digit
digit
2
.
fraction
digit
digit
3
7
fraction
digit
8
fraction
digit
1
9
Syntactic Ambiguity
 A grammar for a language is syntactically ambiguous, or simply ambiguous, if some string in its
language has more than one parse tree.
 Programming language can usually be described by unambiguous grammars.
 If ambiguities exist, they are resolved by establishing conventions that rule out all but one parse tree
for each string.
The following grammar is ambiguous, since the string 1 – 0 – 1 has two parse trees, corresponding to
the parenthesizations (1 – 0) – 1 and 1 – (0 – 1)
E ::= E – E | 0 |1
The two possible parse trees for 1 – 0 – 1 are
E
E
E
E
–
E
E
–
E
1
1
1
0
(1)
–
E
E
0
(2)
–
E
1
Dangling – Else Ambiguity
 A well –known example of syntactic ambiguity is the dangling – else ambiguity, which arises if a
grammar has the two productions.
S ::= if E then S
S ::= if E then S else S
where S represents statements and E represents expressions. Neither production by itself leads to an
ambiguity. Together, however, they permit constructions like the following, where it is not clear to
which if an else belongs:
if E1 then if E2 then S1 else S2
S
if E1 then S
if E2 then S1 else S2
if E1 then S
else <null>
if E2
if
then S1 else S2
(1) Here the else is matched with the nearest unmatched.
S
if E1 then S else S2
E1 then
if
E2
S else
S2
then S1
if E2 then S1
else <null>
 The dangling – else ambiguity is typically resolved by matching an else with nearest unmatched if.
 Dangling – else ambiguity may be resolved by enforcing the innermost matching else for every if
statement.
Derivations
A derivation consists of a sequence of string, beginning with the starting nonterminal. Each
successive string is obtained by replacing a nonterminal by the right side of one of its productions. A
derivation ends with a string consisting entirely of terminals.
Example:
The following derivation begins with the starting nonterminal real – number and ends with the string
of terminals 21.89. In each snapshot, the leftmost nonterminal is replaced by the right side of one of
its productions:
real – number integer – part . fraction
integer – part . digit . fraction
digit digit . fraction
2 digit . fraction
2 1 . fraction
2 1 . digit fraction
2 1 . 8 fraction
2 1 . 8 digit
2 1 . 8 9
GRAMMARS FOR EXPRESSIONS
 A well – designed grammar can make it easy to pick out the meaningful components of a construct.
Lists in Infix Expressions
An expression a + b + c + d can be treated as a list of elements separated by + symbols; the elements
are called terms.
 Since * has higher precedence than +, the expression a * b + c * d + e can be viewed as a list of terms
a*b, c * d and e separated by + symbols.
 A term a * b can itself be treated as a list of elements separated by * symbols; the elements are called
factors.
 Nonterminals E, T, and F represent expressions, terms, and factors respectively.
A grammar for arithmetic expressions
E ::= E + T| E – T | T
T ::= T * F| T / F | F
F ::= number | name | ( E )
 For each of the following strings, draw a parse tree with respect to the grammar for arithmetic
expressions
(a) 2 + 3
Ans.
E
E
+
T
T
F
F
3
2
(b) (2 + 3)
Ans.
E
T
F
(
E
)
E
+
T
T
F
F
3
2
(c) 2 + 3 * 5
Ans.
E
E
+
T
T
T
*
F
F
2
3
(d) (2 +3) * 5
Ans.
F
5
E
T
T
*
F
5
(
E
)
E
+
T
T
F
F
3
2
F
(e) 2 + (3 * 5)
Ans.
E
E
+
T
T
F
F
(
E
2
)
T
T
*
F
F
5
3
Handling Associativity
 The following is a suitable grammar for left associative operators + and – applied to sequence of
numbers:
L ::= L + number
| L – number
| number
Right associative grammar
R ::= number + R
| number – R
| number
Example:
4–2–1
Parse trees
L
L
R
L
–
–
number2
number4
number1
number4 –
R
number2 –
R
number1
University Questions
 The following grammar is based on the syntax of statements in Pascal:
S ::= id := expr
|if expr then S
|if expr then S else S
|while expr do S
|begin SL end
SL ::= S;
|S; SL
Draw parse trees for each of the following:
(a) while expr do id : expr
Ans.
S
while expr
do S
id
:=
expr
(b) begin id := expr end
Ans.
S
begin
SL
end
S
id
:=
expr
(c) if expr then if expr then S else S
Ans.
S
if expr then S
else
if expr
S
then S
or
S
if expr
then S
if expr then
S
else
S
(d) if expr then begin id := expr end
Ans.
S
if expr
then S
begin
SL
end
S
id
:=
expr
(e) while expr do if expr then begin id := expr end
Ans.
S
while expr
do S
if expr
then S
begin
SL
end
S
id
:=
expr
 The following grammar generates numbers in binary notation:
C ::= C 0 | A 1 |0
A ::= B 0 | C 1 |1
B ::= A 0 | B 1
(a) Show that the generated numbers are all multiples of 3.
Ans. C ::= A1
11
(3)10  multiple of 3.
C ::= C 0
C 00
A100
B0100
A00100
100100
(36)10  multiple of 3.
C ::= A1
B01
A001
1001
(9)10  multiple of 3.