Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Languages and Compilers
(SProg og Oversættere)
Bent Thomsen
Department of Computer Science
Aalborg University
With acknowledgement to Elsa Gunter who’s slides this lecture is based on .
1
Type Checking
• When is op(arg1,…,argn) allowed?
• Type checking assures that operations are
applied to the right number of arguments
of the right types
– Right type may mean same type as was
specified, or may mean that there is a
predefined implicit coercion that will be
applied
• Used to resolve overloaded operations
2
Type Checking
• Type checking may be done statically
at compile time or dynamically at run
time
• Untyped languages (eg LISP, Prolog)
do only dynamic type checking
• Typed languages can do most type
checking statically
3
Dynamic Type Checking
• Performed at run-time before each
operation is applied
• Types of variables and operations left
unspecified until run-time
– Same variable may be used at different
types
4
Static Type Checking
• Performed after parsing, before code
generation
• Type of every variable and signature
of every operator must be known at
compile time
5
Static Type Checking
• Can eliminate need to store type
information in data object if no
dynamic type checking is needed
• Catches many programming errors at
earliest point
6
Strongly Typed Language
• When no application of an operator
to arguments can lead to a run-time
type error, language is strongly typed
• Depends on definition of “type”
7
Strongly Typed Language
• C is “strongly typed” but type
coercions may cause unexpected
(undesirable) effects; no array
bounds check (in fact, no runtime
checks at all)
• SML “strongly typed” but still must do
dynamic array bounds checks,
arithmetic overflow checks
8
How to Handle Type Mismatches
• Type checking to refuse them
• Apply implicit function to change
type of data
–Coerce int into real
–Coerce char into int
9
Conversion Between Types:
• Explicit: all conversions between
different types must be specified
• Implicit: some conversions between
different types implied by language
definition
– Implicit conversions called coercions
10
Coercion Examples
Example in Pascal:
var A: real;
B: integer;
A := B
–Implicit coercion - an automatic
conversion from one type to
another
11
Coercions Versus Conversions
• When A has type int and B has type real,
many languages allow coercion implicit in
A := B
• In the other direction, often no coercion
allowed; must use explicit conversion:
– A := round(B); Go to integer nearest B
– A := trunc(B); Delete fractional part of B
12
Type Equality (aka Type Compatibility)
• When are two types “the same”?
• Name equivalence: two types equal
only if they have the same name
– Simple but restrictive
– Usually loosened to allow two types to be
equal when one is defined with the name
of the other (declaration equivalence)
13
Type Equality
• Structure equivalence: Two types
are equivalent if the underlying
data structures for each type are
the same
–Problem: how far to go – are two
records with the same number of
fields of same type, but different
labels equivalent?
14
Elementary Data Types
• Data objects contain single data
value with no components
• Standard elementary types include:
integers, reals, characters,
booleans, enumerations, pointers
(references in SML)
15
Specification of Elementary Data Types
• Basic attributes of type usually used by
compiler and then discarded
• Some partial type information may occur
in data object
• Values usually match with hardware
types: 8 bits, 16 bits, 32 bits, 64 bits
• Operations: primitive operations with
hardware support, and user-defined
operations built from primitive ones
16
Integers – Specification
• Range of integers for some fixed
minint to some fixed maxint, typically
-2^31 through 2^31 – 1 or –2^30
through 2^30 - 1
• Standard collection of operators:
+, -, *, /, mod, ~ (negation)
• Standard relational operations:
=, <, >, <=, >=, =/=
17
Integers - Implementation
• Implementation:
– Binary representation in 2’s
complement arithmetic
– Three different standard
representations:
S
Sign bit (0 for +, 1 for -)
Data
Binary integer
18
Integers - Implementation
• First kind:
S
Data
Sign bit (0 for +, 1 for -) Binary integer
19
Integers – Implementation
• Second kind
T
Address
Type descriptor
• Third kind
S
Data
Sign bit
T S Data
Type descriptor Sign bit
20
Integer Numeric Data
• Positive values
0 1 0 0 1 1 0 0
64 + 8 + 4
= 76
sign bit
21
Subranges
• Example (Ada):
A:integer range 10..20
• Subtype of integers (implicit
coercion into integer)
22
Subranges
• Data may require fewer bits than
integer type
–Data in example above require
only 4 bits
• Range checking usually requires
some runtime time information and
dynamic type checking
23
IEEE Floating Point Format
• IEEE standard 754 specifies both a
32- and 64-bit standard
• At least one supported by most
hardware
• Numbers consist of three fields:
– S (sign), E (exponent), M (mantissa)
S
E
M
24
Floating Point Numbers: Theory
• Every non-zero number may be
uniquely written as
S
e
(-1) * 2 * m
where 1 m < 2 and S is either 0 or 1
25
Floating Point Numbers: Theory
• Every non-zero number may be
uniquely written as
S
(E
–
bias)
(-1) * 2
* (1 + (M/2N))
where 0 M < 1
• N is number of bits for M (23 or 52)
• Bias is 127 of 32-bit ints
• Bias is 1023 for 64-bit ints
26
IEEE Floating Point Format (32 Bits)
• S: a one-bit sign field. 0 is positive.
• E: an exponent in excess-127
notation. Values (8 bits) range from 0
to 255, corresponding to exponents
of 2 that range from -127 to 128.
27
IEEE Floating Point Format (32 Bits)
• M: a mantissa of 23 bits. Since the
first bit of the mantissa in a
normalized number is always 1, it
can be omitted and inserted
automatically by the hardware,
yielding an extra 24th bit of precision.
28
Exponent Bias
• If 8 bits (256 values) +127 added to
exponent to get E
• If E = 127 then 127-127 = 0 is true
exponent
• If E = 129 then 129-127 = 2 is true
exponent
• If E = 120 then 120-127 = -7 is true
exponent
29
Floating Point Number Range
• In 32-bit format, the exponent has 8
bits giving a range from –127 to 128
for exponent
• This give a number range from 10-38
38
to 10 roughly speaking
30
Floating Point Number Range
• In 64-bit format,the exponent is
extended to 11 bits giving a range
from -1023 to +1024 for the
exponent
• This gives a range from 10-308 to
10308 roughly speaking
31
Decoding IEEE format
• Given E, and M, the value of the
representation is:
Parameters
Value
• E=255 and M 0 An invalid number
• E=255 and M = 0
• 0<E<255
2{E-127}(1+(M/ 223))
• E=0 and M 0
2 -126 (M / 223)
• E=0 and M=0
0
32
Example Floating Point Numbers
0
2 *1=
{127-127}
2
*(1
• +1=
+ .0)
0 01111111 000000…
• +1.5= 20*1.5= 2{127-127}*(1+ 222/
223)
0 01111111 100000…
• -5= -22*1.25= 2{129-127}*(1+ 221/
223)
1 10000001 010000…
33
Other Numeric Data
• Short integers (C) - 16 bit, 8 bit
• Long integers (C) - 64 bit
• Boolean or logical - 1 bit with value
true or false (often stored as bytes)
• Byte - 8 bits
34
Other Numeric Data
• Character - Single 8-bit byte - 256
characters
• ASCII is a 7 bit 128 character code
• Unicode is a 16-bit character code
(Java)
• In C, a char variable is simply 8-bit
integer numeric data
35
Enumerations
• Motivation: Type for case analysis over a
small number of symbolic values
• Example: (Ada)
Type DAYS is {Mon, Tues, Wed, Thu, Fri,
Sat, Sun}
• Implementation: Mon 0; … Sun 6
• Treated as ordered type (Mon < Wed)
• In C, always implicitly coerced to integers
36
Pointers
• A pointer type is a type in which the range
of values consists of memory addresses
and a special value, nil (or null)
• Use of pointers to create arbitrary
data structures
37
Pointer Data
• Each pointer can point to an object of
another data structure
– Its l-value is its address; its r-value is
the address of another object
• Accessing r-value of r-value of
pointer called dereferencing
38
Pointer Aliasing
• A:= B
– Numeric assignment
A:
B:
A:
B:
7.2
A: 0.4
B: 0.4
0.4
– Pointer assignment
7.2
0.4
A:
B:
0.4
39
Problems with Pointers
• Dangling Pointer
A:
B:
Delete A
A:
B:
A:
B:
0.4
• Garbage (lost heap-dynamic variables)
7.2
0.4
7.2
0.4
40
Ways to Create Dangling Pointers
int * A, B;
A = new int;
A = 5;
B = A;
delete A;
/* B is still pointing to the address of
object A returned to stack */
41
Ways to Create Dangling Pointers
int * A;
int * sub () { int B;
B = 5;
return B;}
main () { A = sub(); . . . }
/* A has been assigned the address of
an object that is out of scope */
42
SML references
• An alternative to allowing pointers directly
• References in SML can be typed
• … but they introduce some abnormalities
43
SML imperative constructs
• SML reference cells
– Different types for location and contents
x : int
y : int ref
!y
ref x
non-assignable integer value
location whose contents must be integer
the contents of location y
expression creating new cell initialized to x
– SML assignment
operator := applied to memory cell and new contents
– Examples
y := x+3 place value of x+3 in cell y; requires x:int
y := !y + 3 add 3 to contents of y and store in location y
44
SML examples
• Create cell and change contents
val x = ref “Bob”;
x := “Bill”;
• Create cell and increment
val y = ref 0;
y := !y + 1;
• While loop
val i = ref 0;
while !i < 10 do i := !i +1;
!i;
45
Composite Data Types
• Composite data types are sets of
data objects built from data objects of
other types
• Elements called data structures
• Some created by users, eg an array
of integers
• Some created internally by compiler,
eg symbol table, or subroutine
activation record
46
Specification of Structured Data Types
• Number of components
– Fixed or varying over life of data
structure
• Arrays and records have fixed
number
• Lists have variable number
– If variable number of components, is
there a max number possible
47
Specification of Structured Data Types
• Type of each component
–Homogeneous: all components
have same type
• Arrays
–Heterogeneous: components have
varying types
• Records (also lists in some
languages, but not SML)
48
Specification of Structured Data Types
• Method of accessing components
–Array subscripting
–Record labels
–SML datatype pattern matching
49
Operations on Data Structures
• Creation and deletion of
structures
• Whole-structure operations
–Assigning to variable
–Iterating a function over the
structure
–Computing its length or size
50
Operations on Data Structures
• Component selection operations
– Direct access (aka random selection)
• Takes constant time
– Sequential selection
• Usually proportional to some
dimension of the structure (like the
number of components)
– May allow component update, or may
only allow access to value
51
Operations on Data Structures
• Component insertion and deletion
– Applies to structures with variable
number of components
– Causes major effects on possible data
layouts
• Example seen in the layouts for
strings
52
General Layout of Data Structures
• Descriptor
– Contains type information and other
attributes of data structure
– May only exist in symbol table at
compile time, or may be a direct part of
data object, or split between two
– Usually several words long
53
General Layout of Data Structures
• Layout of component data
–Sequential: arrays and records
• Uses least storage for structure if
number of components fixed
• Least flexible for overall storage
management
54
General Layout of Data Structures
• Layout of component data
–Linked: lists, trees
• Uses more space per structure
since each component must also
have a pointer to it
• Maximum flexibility for overall
storage management, put pieces
where they fit
55
Strings
• Character string is a data object
composed of a sequence of
characters
• Main kinds:
– Fixed declared length
– Variable length with declared maximum
length
– Unbounded length
56
String operations
•
•
•
•
String concatenation
Length of string
Substring selection by position
Lexicographical ordering (based on
underlying codes such as ASCII)
• Substring by pattern matching
57
String Interface
• Can be implemented as primitive
type (as in SML or Java) or an array
of characters (as in C and C++)
• If primitive, operations are built in
• If array of characters, string
operations provided through a library
58
String Implementations
• Fixed declared length (aka static
length)
–Packed array padded with blanks
Descriptor
String
Length=12
Pointer to data
Data
A l l •
a b o a
r d ø ø
59
String Implementations
• May need runtime descriptor
for type, and length is
substring operations include
runtime checks
• Update pads with blanks or
truncates as necessary
60
String Implementations
• Variable length with declared
maximum (aka limited dynamic
length)
– Packed array with runtime descriptor
String
Max Length=12
Cur Length=10
Pointer to data
A l l •
a b o a
r d
61
String Implementations
• Descriptor may occur as initial
block of data object for array
62
String Implementations
• Unbounded length (aka dynamic length)
– Two standard implementations
– First: Linked list
String
Curr Length = 10
Pointer to data
a b
o a
A
l
r
d
l •
63
String Implementations
• Unbounded length
– Second implementation: null terminated
contiguous array
String
Pointer to data
A l
l
•a b o a r d
– Must reallocate and copy when string
grows
64
Arrays
• Ordered sequence of fixed number of
objects all of the same type
• Indexed by integer, subrange, or
enumeration type, called subscript
• Multidimensional arrays have one
subscript per each dimension
• L-value for array element given by
accessing formula
65
Type Checking Arrays
•
•
•
•
•
Basic type – array
Number of dimensions
Type of components
Type of subscript
Range of subscript (must be done at
runtime, if at all)
66
Array Layout
• Assume one dimension
1 dim array
Virtual Origin (VO)
Lower Bound (LB)
A[0]
A[LB]
A[LB+1]
Upper Bound (UB)
Comp type
Comp size (E)
A[UB]
67
Array Component Access
• Component access through
subscripting, both for lookup (r-value)
and for update (l-value)
• Component access should take
constant time (ie. looking up the 5th
element takes same time as looking
up 100th element)
68
Array Access Function
• L-value of A[i] = VO + (E * i)
= + (E * (i – LB))
• Computed at compile time
• VO = - (E * LB)
• More complicated for multiple
dimensions
69
Records
• Ordered sequence of fixed number of
objects of differing types
• Indexed by fixed identifiers called
labels or fields
• L-value for record element given by
more complex accessing formula
than for arrays
70
Typical Record Layout
Descriptor
Record type
Num. of components
Comp 1 label
Comp 1 type
Comp 1 location =
Comp n label
Comp n type
Comp n location
Data
R.1
R.2
R.n
71
Type Checking Record
• Basic type – record
• Number, name (label) of
components
• Possibly order of labels
– If order matters, labels must be
unique
– If order doesn’t matter, layout must
give a canonical ordering
• Type of components per label
72
Record Layout
• Most of descriptor exists only at compile
time
• Access function:
• Comp i location given by
i-1
• L-value of R.i = + (size of R.j)
j=1
73
Lists
• Ordered collection of variable
number of elements
–Many languages (LISP, Scheme,
Prolog) allow heterogeneous list
–SML has only homogeneous lists
74
Lists
• Layout: linked series of cells
(called cons cells) with descriptor,
data and pointers
–Data in first cell of list called head
of list
–R-value of pointer in first cell called
tail of list
75
Lists
• Sequential access of data by
following pointers
–Access is linear in position in
list
• Takes twice as long to look up
10th element as to look up 5th
element
76
Lists
• Adding a new element to list
done only at head, called
consing
• Creates new cell with element
to be added and pointer to old
list (ie. creates new list)
77
List Layout
• Example: [1,2.5,’a’]
list
list
int
1
list
real
2.5
char ‘a’
78
List Layout
• Example: [[1,2.5],[’a’]]
list
list
int
1
real
list
list
char ‘a’
2.5
79
Union Types
• Set-wise the (discriminated) union of
the component types
• Interchangeable with variant records
as primitive type construct
• Elements chosen from one of
component types
80
Union Types
• Problem: if int occurs as two
different components of union
type, can we tell which
component an int is for?
81
Union Types
• Two kinds of union types:
–Free union - Ans: no
–Discriminated union – Ans: yes
• If each component is tagged to
separate occurrences of same type,
discriminated union, otherwise not
82
Union Layout
Descriptor
Union type
Data
Actual data
Component type
L
Component tag
Component location
Unused space
• No tag if free union
• L is fixed length of biggest component
83
Combining Data Structures
• Possible to have any of the
above structures as components
of others
• Since lists are of variable size,
but arrays must store fixed size
element, how to store lists in an
array?
84
Combining Data Structures
• Answer: cons cells have uniform
size, store just the leading cons
cell
85
Example:
• Data in 4-element array of lists
int
5
list
list
int
6
list
int
int
3
list
1
list
int
2
list
int
7
86
Type symmary
• Static type checking takes place after syntax
check and before code generation
• Some type checking can be necessary at run
time
• Types vs. Syntax
• Simply typed values and composite values
• User defined types
• Equivalence on types
87