Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LEXICAL SYNTAX Grammars deals with units called tokens. Tokens – sequence of characters having a collective meaning. Tokens and Spellings The syntax of a programming language is specified in terms of units called tokens or terminals. A lexical syntax for a language specifies the correspondence between the written representation of the language and the tokens or terminals in a grammar for the language. Alphabetic character sequences that are treated as units in a language are called keywords. Keywords are reserved words if they cannot be used as names. Example: if and while are keywords in both Pascal and C. The actual character sequence used to write down an occurrence of a token is called the spelling of that occurrence. Subscripts can be used to distinguish between occurrences of a token; the subscript might be the spelling For a token ‘number’ for integers, the character sequence b * b – 4 * a *c is represented by the token sequence nameb * nameb = number4* namea * namec White space in the form of black, tab, and newline characters can typically be inserted between token without changing the meaning of a program. Comments between tokens are ignored. CONTEXT – FREE GRAMMARS The concrete syntax of a language describes its written representation, including lexical details such as placement of keywords and punctuation marks. Context - free grammars or simply grammars, are a notation for specifying concrete syntax BNF (Backus – Naur Form) is a way of writing grammars. Introduction to Grammars A grammar for a language imposes a hierarchical structure, called a parse tree on programs in the language. Example: parse tree for string 3.14 real – number integer – part fraction digit 3 . digit fraction 1 digit 4 The leaves at the bottom of a parse tree are labeled with terminals or tokens. The other nodes of a parse tree are labeled with non – terminals. Each node in the parse tree is based on a production, a rule that defines a nonterminal in terms of a sequence of terminals and non terminals. Definition of Context – Free Grammars A context – free grammar or simply grammar, has four parts: 1) A set of tokens or terminals; these are the atomic symbols in the language. 2) A set of nonterminals; these are the variables representing constructs in the language. 3) A set of rules called productions for identifying the components of a construct. Each production has a nonterminal as its left side, the symbol ::=, and a string over the sets of terminals and nonterminals as its right side. 4) A nonterminal chosen as the starting nonterminal; it represents the main construct of the language. BNF (Backus – Naur Form) BNF is one notation used to write grammars. Terminals and Nonterminals In BNF, nonterminals are enclosed between the special symbols < and >, and the empty string is written as <empty>. Terminals consisting of symbols like + and * usually appear as is, but they can be quoted for emphasis. Productions Read the symbol ::= as “can be” and read the symbol | as “or”. BNF Rules for Real Numbers <real – number>::=<integer – part> . <fraction> In words, a real number has an integer part, a decimal point, and a fractional part. <integer – part>::=<digit>|<integer – part><digit>|<empty> The integer part can be a digit, or it can be a integer – part followed by a digit or it can be an empty string. <fraction>::=<digit>|<digit><fraction> A fraction part can be a digit, or it can be digit followed by a fractional part. <digit>::=0|1|2|3|4|5|6|7|8|9 The variable <empty> represents an empty string of length 0. It is useful for specifying optional constructs. In the above example Non – terminals - <real - number>, <integer – part>, <fraction>, and <digit>. The token are the digit 0,1,. . . ,9 and the decimal point. Example: <identifier>::=<letter>|<letter><digit> <letter>::=a|b|c . . . |z|A|B|C . . .|Z <digit>::=0|1|2|3|4|5|6|7|8|9 Parse Trees The productions in a grammar are rules for building strings of tokens. A parse tree shows how a string can be built. A parse tree with respect to a grammar is a tree satisfying the following: 1) Each leaf is labeled with a terminal or <empty>, representing the empty string. 2) Each nonleaf node is labeled with a nonterminal. 3) The label of a nonleaf node is the left side of some productions and the labels of the children of the node, from left to right, form the right side of that production. 4) The root is labeled with the starting nonterminal. A parse tree generates the string formed by reading the terminals at its leaves from left to right. A string is in a language if and only if it is generated by some parse tree. The construction of a parse tree is called parsing. Example: real – number integer –part integer – part integer – part digit digit 2 . fraction digit digit 3 7 fraction digit 8 fraction digit 1 9 Syntactic Ambiguity A grammar for a language is syntactically ambiguous, or simply ambiguous, if some string in its language has more than one parse tree. Programming language can usually be described by unambiguous grammars. If ambiguities exist, they are resolved by establishing conventions that rule out all but one parse tree for each string. The following grammar is ambiguous, since the string 1 – 0 – 1 has two parse trees, corresponding to the parenthesizations (1 – 0) – 1 and 1 – (0 – 1) E ::= E – E | 0 |1 The two possible parse trees for 1 – 0 – 1 are E E E E – E E – E 1 1 1 0 (1) – E E 0 (2) – E 1 Dangling – Else Ambiguity A well –known example of syntactic ambiguity is the dangling – else ambiguity, which arises if a grammar has the two productions. S ::= if E then S S ::= if E then S else S where S represents statements and E represents expressions. Neither production by itself leads to an ambiguity. Together, however, they permit constructions like the following, where it is not clear to which if an else belongs: if E1 then if E2 then S1 else S2 S if E1 then S if E2 then S1 else S2 if E1 then S else <null> if E2 if then S1 else S2 (1) Here the else is matched with the nearest unmatched. S if E1 then S else S2 E1 then if E2 S else S2 then S1 if E2 then S1 else <null> The dangling – else ambiguity is typically resolved by matching an else with nearest unmatched if. Dangling – else ambiguity may be resolved by enforcing the innermost matching else for every if statement. Derivations A derivation consists of a sequence of string, beginning with the starting nonterminal. Each successive string is obtained by replacing a nonterminal by the right side of one of its productions. A derivation ends with a string consisting entirely of terminals. Example: The following derivation begins with the starting nonterminal real – number and ends with the string of terminals 21.89. In each snapshot, the leftmost nonterminal is replaced by the right side of one of its productions: real – number integer – part . fraction integer – part . digit . fraction digit digit . fraction 2 digit . fraction 2 1 . fraction 2 1 . digit fraction 2 1 . 8 fraction 2 1 . 8 digit 2 1 . 8 9 GRAMMARS FOR EXPRESSIONS A well – designed grammar can make it easy to pick out the meaningful components of a construct. Lists in Infix Expressions An expression a + b + c + d can be treated as a list of elements separated by + symbols; the elements are called terms. Since * has higher precedence than +, the expression a * b + c * d + e can be viewed as a list of terms a*b, c * d and e separated by + symbols. A term a * b can itself be treated as a list of elements separated by * symbols; the elements are called factors. Nonterminals E, T, and F represent expressions, terms, and factors respectively. A grammar for arithmetic expressions E ::= E + T| E – T | T T ::= T * F| T / F | F F ::= number | name | ( E ) For each of the following strings, draw a parse tree with respect to the grammar for arithmetic expressions (a) 2 + 3 Ans. E E + T T F F 3 2 (b) (2 + 3) Ans. E T F ( E ) E + T T F F 3 2 (c) 2 + 3 * 5 Ans. E E + T T T * F F 2 3 (d) (2 +3) * 5 Ans. F 5 E T T * F 5 ( E ) E + T T F F 3 2 F (e) 2 + (3 * 5) Ans. E E + T T F F ( E 2 ) T T * F F 5 3 Handling Associativity The following is a suitable grammar for left associative operators + and – applied to sequence of numbers: L ::= L + number | L – number | number Right associative grammar R ::= number + R | number – R | number Example: 4–2–1 Parse trees L L R L – – number2 number4 number1 number4 – R number2 – R number1 University Questions The following grammar is based on the syntax of statements in Pascal: S ::= id := expr |if expr then S |if expr then S else S |while expr do S |begin SL end SL ::= S; |S; SL Draw parse trees for each of the following: (a) while expr do id : expr Ans. S while expr do S id := expr (b) begin id := expr end Ans. S begin SL end S id := expr (c) if expr then if expr then S else S Ans. S if expr then S else if expr S then S or S if expr then S if expr then S else S (d) if expr then begin id := expr end Ans. S if expr then S begin SL end S id := expr (e) while expr do if expr then begin id := expr end Ans. S while expr do S if expr then S begin SL end S id := expr The following grammar generates numbers in binary notation: C ::= C 0 | A 1 |0 A ::= B 0 | C 1 |1 B ::= A 0 | B 1 (a) Show that the generated numbers are all multiples of 3. Ans. C ::= A1 11 (3)10 multiple of 3. C ::= C 0 C 00 A100 B0100 A00100 100100 (36)10 multiple of 3. C ::= A1 B01 A001 1001 (9)10 multiple of 3.