Download Constructing Lexical Analysers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birkhoff's representation theorem wikipedia , lookup

Deligne–Lusztig theory wikipedia , lookup

Transcript
Converting NFAs to DFAs
How a Syntax Analyser is constructed
Converting a nfa to a dfa
Defn: The e-closure of a state is the set of all states,
including S itself, that you can get to via e-transitions.
The e-closure of state S is denoted: S
Converting a nfa to a dfa (Cont.)
Example:
The e-closure of state 1 = { 1, 2, 4 }
The e-closure of state 3 = { 3, 2, 4 }
Defn: The e-closure of a set of states S1, ... Sn is
S1  S2  ...  Sn.
Example: The e-closure for above states 1 and 3 is
{ 1, 2, 4 }  { 3, 2, 4 } = { 1, 2, 3, 4 }
To construct a dfa from a nfa
Step 1: Let the start state of the dfa be formed from the
e-closure of the start state of the nfa.
Subsequent steps: If S is any state that you have previously
constructed for the dfa and it is formed from say states t1, ... , tr
of the nfa, then for any symbol x for which at least one of the
states t1, ... , tr has a x-successor, the x-successor of S is the
e-closure of the x-successors of t1, ... , tr.
Any state of the dfa which is formed from an accepting state,
among others, of the nfa becomes an accepting state.
To construct a dfa from a nfa
(Cont.1)
Example 1: To convert the following nfa:
b
5
we get:
This constructs a dfa that has no epsilon-transitions
To construct a dfa from a nfa
(Cont.2)
Example 2: To convert the nfa for an identifier to a dfa
To construct a dfa from a nfa
(Cont.3)
we get:
Minimizing the Number of States
in a DFA
Step 1: Start with two sets of states
(a) all the accepting states, and
(b) all the non-accepting states
Subsequent steps: Given the sets of states S1, ... Sr consider
each set S and each symbol x in turn. If any member of S has a
x-successor and this x-successor is in say S', then unless all the
members of S have x-successors that are in S', split up S into
those members whose x-successors are in S' and the others
(which don't have x-successors in S').
Minimizing the Number of States
in a DFA(Cont.1)
Example 1. Consider the dfa we constructed for an identifier
(with renumbered states):
Minimizing the Number of States
in a DFA(Cont.2)
The sets of states for this dfa are:
S1
S2
Nonaccepting states
Accepting states
1
2
3
4
All states in S2 have the successors letter-successor and digit-successor,
and the successor states are all in the set of states S2.
Combine all the states of S2 to get:
Minimizing the Number of States
in a DFA(Cont.3)
Example 2. Consider the dfa:
All of the states (1, 2, and 3) are accepting states and all the
successors are also accepting states, but state 1 has an
a-successor whereas states 2 and 3 do not.
Minimizing the Number of States
in a DFA(Cont.4)
So, we split the set of accepting states into two sets S1 and S2
where: S1 consists of state 1, and S2 consists of states 2, 3
to get:
HOW LEX WORKS
Using the methods described above, Lex constructs a
mimimized finite automata for each regular
expression in the definition file.
Lex generates a C program, which we will refer to as
lex.yy.c
The finite automatas are represented in lex.yy.c by a
set of arrays.
For instance, a portion of a finite automata such as:
.
4
+
7
can be represented by entering. in the associated
array, a 7 in the column for “+” at row 4.
lex.yy.c keeps track of the latest accepting state it
has reached in any of the finite automatas, plus the
number of source characters it has read at that point.
When it reaches a stage that no transition exists for
the next source symbol from any of the states it has
reached in any of the finite automatas, it picks the
regular expression corresponding to the finite
automata in which this last accepting state occurs,
and it pushes back onto the remaining input any
source characters read after reaching that state.
Consider, for example, a Lex defn. file containing:
{digit}+(”.” {digit}+)?
{…return Number;}
{digit}+(”.” {digit}+)?e{digit}+ {…return Float;}
Finite automata corresponding to the above re’s are:
.
1 digit 2
3
digit
1 digit 2
digit
dfa for Number
digit
digit
e
4
.
3
digit
4
digit
e
5
digit
6
digit
dfa for
Float
Example: let the remaining input be
36e8=X1…
On reading the “3”, lex.yy.c records that the latest
accepting state encountered is state 2 in the dfa for
Number, and the no. of source characters read is 1. (It
has also reached state 2 in the dfa for Float).
On reading the “6”, lex.yy.c records the above again,
except that the no. of characters read is 2.
On reading the “8”, lex.yy.c records that the latest
accepting state is state 6 in the dfa for Float, and no. of
characters read is 4.
On reading the “=”, lex.yy.c finds that state 6 has no “=“
successor. This is the 5th character read. So the last
accepting state (state 6) is in the dfa for Float after 4
characters had been read. Hence Float is taken as
matching the remaining input, and the 5th character read,
i.e the “=“, is pushed back onto the remaining input.