Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
College of Computer Science & Technology
Compiler Construction Principles &
Implementation Techniques
Dr. Ying JIN
Associate Professor
Oct. 2007
Compiler Construction Principles & Implementation Techniques
-1-
College of Computer Science & Technology
–
–
–
–
–
What we have already introduced
What this course is about?
What is a compiler?
The ways to design and implement a compiler;
General functional components of a compiler;
The general translating process of a compiler;
Compiler Construction Principles & Implementation Techniques
-2-
College of Computer Science & Technology
What will be introduced
• Scanning
– General information about a Scanner (scanning)
• Functional requirement (input, output, main function)
• Data structures
– General techniques in developing a scanner
• How to define the lexeme of a programming language?
• How to implement a scanner with respect to the definition of
lexeme?
Compiler Construction Principles & Implementation Techniques
-3-
College of Computer Science & Technology
What will be introduced
• Scanning
– General techniques in developing a scanner
• Two formal languages
– Regular expression
– Finite automata (DFA, NFA)
• Three algorithms
– From regular expression to NFA
– From NFA to DFA
– Minimizing DFA
• One implementation
– Implementing DFA to get a scanner
Compiler Construction Principles & Implementation Techniques
-4-
College of Computer Science & Technology
Outline
2.1 Overview
2.1.1 General Function of a Scanner
2.1.2 Some Issues about Scanning
2.2 Finite Automata
2.2.1 Definition and Implementation of DFA
2.2.2 Non-Determinate Finite Automata
2.2.3 Transforming NFA into DFA
2.2.4 Minimizing DFA
2.3 Regular Expressions
2.3.1 Definition of Regular Expressions
2.3.2 Regular Definition
2.3.4 From Regular Expression to DFA
2.4 Design and Implementation of a Scanner
2.4.1 Developing a Scanner from DFA
2.4.2 A Scanner Generator – Lex
Compiler Construction Principles & Implementation Techniques
-5-
College of Computer Science & Technology
Knowledge Relation Graph
Lexical definition using
Regular Expression
transfo NFA
rming
transforming
basing on
DFA
minimizing
Develop a
Scanner
Compiler Construction Principles & Implementation Techniques
implement
minimized
DFA
-6-
College of Computer Science & Technology
Outline
2.1 Overview
2.1.1 General Function of a Scanner
2.1.2 Some Issues about Scanning
2.2 Finite Automata
2.2.1 Definition and Implementation of DFA
2.2.2 Non-Determinate Finite Automata
2.2.3 Transforming NFA into DFA
2.2.4 Minimizing DFA
2.3 Regular Expressions
2.3.1 Definition of Regular Expressions
2.3.2 Regular Definition
2.3.4 From Regular Expression to DFA
2.4 Design and Implementation of a Scanner
2.4.1 Developing a Scanner from DFA
2.4.2 A Scanner Generator – Lex
Compiler Construction Principles & Implementation Techniques
-7-
College of Computer Science & Technology
2.1 Overview
• Status of Scanning in a Compiler
• Token
• General function of scanning
– Input
– Output
– Functional description
• Some issues about scanning
– Data structure
– Blank/tab, return, newline, comments
– Lexical errors
Compiler Construction Principles & Implementation Techniques
-8-
College of Computer Science & Technology
Status of Scanning in a Compiler
Lexical Analysis
scanning
Syntax Analysis
Parsing
Target Code
Generation
Intermediate Code
Optimization
Semantic Analysis
Intermediate Code
Generation
analysis/front end
synthesis/back end
Compiler Construction Principles & Implementation Techniques
-9-
College of Computer Science & Technology
Token
• 单词的内部表示
• Is the minimal semantic unit in a programming
language;
• A token includes two parts
– Token Type
– Attributes (Semantic information)
• Example:
– < id, “ x ” >
– < intNum, 10 >
Compiler Construction Principles & Implementation Techniques
-10-
College of Computer Science & Technology
Two Forms of a Scanner
• Independent
– A scanner is independent of other parts of a compiler
– Output: a sequence of tokens;
• Attached
– As an auxiliary function to the parser;
– Returns a token once called by the parser;
Compiler Construction Principles & Implementation Techniques
-11-
Scanner
College of Computer Science & Technology
• Two forms
CharList
call
Attached
Scanner
Parser
Token
CharList
Independent
TokenList
Parser
Scanner
Compiler Construction Principles & Implementation Techniques
-12-
College of Computer Science & Technology
General Function of Scanning
• Input
– Source program
• Output
– Sequence of tokens/ token
• Functional description (similar to spelling)
– Reads source program;
– Recognizes words one by one according to the lexical definition of
the source language;
– Builds internal representation of words – tokens;
– Checks lexical errors;
– Returns the sequence of tokens/token;
Compiler Construction Principles & Implementation Techniques
-13-
College of Computer Science & Technology
Some Issues about Scanning
Compiler Construction Principles & Implementation Techniques
-14-
College of Computer Science & Technology
Tokens
• Source program
– Stream of characters
• Token
– A sequence of characters that can be treated as a unit in the
grammar of a programming language;
– Token types
•
•
•
•
•
Identifier: x, y1, ……
Number: 12, 12.3,
Keywords (reserved words): int, real, main,
Operator: +, -, *, / , >, < , ……
Delimiter: ;, {, }, ……
Compiler Construction Principles & Implementation Techniques
Each can be
treated as
one type!
-15-
College of Computer Science & Technology
Tokens
• The content of a token
token-type
Semantic information
Semantic information (attributes):
- Identifier: the string
- Number: the value
- Keywords (reserved words):
the number in the keyword table
- Operator: itself
- Delimiter: itself
Compiler Construction Principles & Implementation Techniques
-16-
College of Computer Science & Technology
Token
• Data Structure
– Token type
enum
– One token
struct
– Sequence of tokens
list
Please define data structure with C !!!
Compiler Construction Principles & Implementation Techniques
-17-
College of Computer Science & Technology
Keywords
• Keywords
– Words that have special meaning
– Can not be used for other meanings
• Reserved words
– Words that are reserved by a programming language for special
meaning;
– Can be used for other meaning, overload previous meaning;
• Keyword table
– to record all keywords defined by the source programming language
Compiler Construction Principles & Implementation Techniques
-18-
College of Computer Science & Technology
Sample Source Program
v
a
r
 i n
t
r
e
a
d
) ;
)
;
 i f △
x
>
y
w
r
i
1
)
△ e
r
i
)
;
 }
t
t
e
(
e
(
x
(
0
,
y
;
 { 
 r
e
a
d
(
y
△ t h
e
n
△
△ x
Compiler Construction Principles & Implementation Techniques
l
s
e △ w
-19-
Sequence of Tokens
College of Computer Science & Technology
var,k 
int, k
x, ide
,
y, ide
;
{
read, k (
x, ide
)
;
read, k
(
y, ide
;
if, k
x, ide
>
y , ide
then, k write,k (
;
else, k
write,k (
}
)
1, num )
0, num )
Compiler Construction Principles & Implementation Techniques
;
-20-
College of Computer Science & Technology
•
•
•
•
Blank, tab, newline, comments
No semantic meaning
Only for readability
Can be removed
Line number should be calculated;
Compiler Construction Principles & Implementation Techniques
-21-
College of Computer Science & Technology
The End of Scanning
• Two optional situations
– Once read the character representing the end of a
program;
• PASCAL: ‘.’
– The end of the source program file
Compiler Construction Principles & Implementation Techniques
-22-
College of Computer Science & Technology
Lexical Errors
• Limited types of errors can be found during scanning
– illegal character ;
• §, ←
– the first character is wrong;
• “/abc”
• Lexical Error Recovery
– Once a lexical error has been found, scanner will not stop, it will
take some measures to continue the process of scanning
• Ignore current character, start from next character
if § a then x = 12.else ……
Compiler Construction Principles & Implementation Techniques
-23-
College of Computer Science & Technology
To Develop a Scanner
• Now we know what is the function of a Scanner;
• How to implement a scanner?
– Basis : lexical rules of the source programming language
• Set of allowed characters;
• What kinds of tokens it has?
• The structure of each token-type
– Scanner will be developed according to the lexical rules;
Compiler Construction Principles & Implementation Techniques
-24-
College of Computer Science & Technology
How to define
the lexical structure of a programming language?
1. Natural language (English, Chinese ……)
- easy to write, ambiguous and hard to implement
2. Formal languages
- need some special background knowledge;
- concise, easy to implement;
- automation;
Compiler Construction Principles & Implementation Techniques
-25-
College of Computer Science & Technology
• Two formal languages for defining lexical structure
of a programming language
– Finite automata
• Non-deterministic finite automata (NFA)
• Deterministic finite automata (DFA)
– Regular expressions (RE)
– Both of them can formally describing lexical structure
• The set of allowed words
– They are equivalent;
– FA is easy to implement; RE is easy to define;
Compiler Construction Principles & Implementation Techniques
-26-
College of Computer Science & Technology
•
•
•
•
•
•
Summary
What is the main function of scanning (a scanner)?
Two forms of Scanner?
What is a token? The content of a token?
Semantic information for different token types?
Lexical error?
Two key problems in design and implementation of a
scanner?
Compiler Construction Principles & Implementation Techniques
-27-
College of Computer Science & Technology
Reading Assignment (2)
• Topic: Different methods for dealing with lexical
errors?
• Objectives
– Try to find out different ways to cope with lexical errors;
• Tips
– There are different schema (error correction; error repair;
error recovery;……)
– Google some research papers;
– Treat it as a survey;
– challenging!
Compiler Construction Principles & Implementation Techniques
-28-