Download here

Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Elsa Gunter who’s slides this lecture is based on . 1 Type Checking • When is op(arg1,…,argn) allowed? • Type checking assures that operations are applied to the right number of arguments of the right types – Right type may mean same type as was specified, or may mean that there is a predefined implicit coercion that will be applied • Used to resolve overloaded operations 2 Type Checking • Type checking may be done statically at compile time or dynamically at run time • Untyped languages (eg LISP, Prolog) do only dynamic type checking • Typed languages can do most type checking statically 3 Dynamic Type Checking • Performed at run-time before each operation is applied • Types of variables and operations left unspecified until run-time – Same variable may be used at different types 4 Static Type Checking • Performed after parsing, before code generation • Type of every variable and signature of every operator must be known at compile time 5 Static Type Checking • Can eliminate need to store type information in data object if no dynamic type checking is needed • Catches many programming errors at earliest point 6 Strongly Typed Language • When no application of an operator to arguments can lead to a run-time type error, language is strongly typed • Depends on definition of “type” 7 Strongly Typed Language • C is “strongly typed” but type coercions may cause unexpected (undesirable) effects; no array bounds check (in fact, no runtime checks at all) • SML “strongly typed” but still must do dynamic array bounds checks, arithmetic overflow checks 8 How to Handle Type Mismatches • Type checking to refuse them • Apply implicit function to change type of data –Coerce int into real –Coerce char into int 9 Conversion Between Types: • Explicit: all conversions between different types must be specified • Implicit: some conversions between different types implied by language definition – Implicit conversions called coercions 10 Coercion Examples Example in Pascal: var A: real; B: integer; A := B –Implicit coercion - an automatic conversion from one type to another 11 Coercions Versus Conversions • When A has type int and B has type real, many languages allow coercion implicit in A := B • In the other direction, often no coercion allowed; must use explicit conversion: – A := round(B); Go to integer nearest B – A := trunc(B); Delete fractional part of B 12 Type Equality (aka Type Compatibility) • When are two types “the same”? • Name equivalence: two types equal only if they have the same name – Simple but restrictive – Usually loosened to allow two types to be equal when one is defined with the name of the other (declaration equivalence) 13 Type Equality • Structure equivalence: Two types are equivalent if the underlying data structures for each type are the same –Problem: how far to go – are two records with the same number of fields of same type, but different labels equivalent? 14 Elementary Data Types • Data objects contain single data value with no components • Standard elementary types include: integers, reals, characters, booleans, enumerations, pointers (references in SML) 15 Specification of Elementary Data Types • Basic attributes of type usually used by compiler and then discarded • Some partial type information may occur in data object • Values usually match with hardware types: 8 bits, 16 bits, 32 bits, 64 bits • Operations: primitive operations with hardware support, and user-defined operations built from primitive ones 16 Integers – Specification • Range of integers for some fixed minint to some fixed maxint, typically -2^31 through 2^31 – 1 or –2^30 through 2^30 - 1 • Standard collection of operators: +, -, *, /, mod, ~ (negation) • Standard relational operations: =, <, >, <=, >=, =/= 17 Integers - Implementation • Implementation: – Binary representation in 2’s complement arithmetic – Three different standard representations: S Sign bit (0 for +, 1 for -) Data Binary integer 18 Integers - Implementation • First kind: S Data Sign bit (0 for +, 1 for -) Binary integer 19 Integers – Implementation • Second kind T Address Type descriptor • Third kind S Data Sign bit T S Data Type descriptor Sign bit 20 Integer Numeric Data • Positive values 0 1 0 0 1 1 0 0 64 + 8 + 4 = 76 sign bit 21 Subranges • Example (Ada): A:integer range 10..20 • Subtype of integers (implicit coercion into integer) 22 Subranges • Data may require fewer bits than integer type –Data in example above require only 4 bits • Range checking usually requires some runtime time information and dynamic type checking 23 IEEE Floating Point Format • IEEE standard 754 specifies both a 32- and 64-bit standard • At least one supported by most hardware • Numbers consist of three fields: – S (sign), E (exponent), M (mantissa) S E M 24 Floating Point Numbers: Theory • Every non-zero number may be uniquely written as S e (-1) * 2 * m where 1  m < 2 and S is either 0 or 1 25 Floating Point Numbers: Theory • Every non-zero number may be uniquely written as S (E – bias) (-1) * 2 * (1 + (M/2N)) where 0  M < 1 • N is number of bits for M (23 or 52) • Bias is 127 of 32-bit ints • Bias is 1023 for 64-bit ints 26 IEEE Floating Point Format (32 Bits) • S: a one-bit sign field. 0 is positive. • E: an exponent in excess-127 notation. Values (8 bits) range from 0 to 255, corresponding to exponents of 2 that range from -127 to 128. 27 IEEE Floating Point Format (32 Bits) • M: a mantissa of 23 bits. Since the first bit of the mantissa in a normalized number is always 1, it can be omitted and inserted automatically by the hardware, yielding an extra 24th bit of precision. 28 Exponent Bias • If 8 bits (256 values) +127 added to exponent to get E • If E = 127 then 127-127 = 0 is true exponent • If E = 129 then 129-127 = 2 is true exponent • If E = 120 then 120-127 = -7 is true exponent 29 Floating Point Number Range • In 32-bit format, the exponent has 8 bits giving a range from –127 to 128 for exponent • This give a number range from 10-38 38 to 10 roughly speaking 30 Floating Point Number Range • In 64-bit format,the exponent is extended to 11 bits giving a range from -1023 to +1024 for the exponent • This gives a range from 10-308 to 10308 roughly speaking 31 Decoding IEEE format • Given E, and M, the value of the representation is: Parameters Value • E=255 and M  0 An invalid number • E=255 and M = 0  • 0<E<255 2{E-127}(1+(M/ 223)) • E=0 and M  0 2 -126 (M / 223) • E=0 and M=0 0 32 Example Floating Point Numbers 0 2 *1= {127-127} 2 *(1 • +1= + .0) 0 01111111 000000… • +1.5= 20*1.5= 2{127-127}*(1+ 222/ 223) 0 01111111 100000… • -5= -22*1.25= 2{129-127}*(1+ 221/ 223) 1 10000001 010000… 33 Other Numeric Data • Short integers (C) - 16 bit, 8 bit • Long integers (C) - 64 bit • Boolean or logical - 1 bit with value true or false (often stored as bytes) • Byte - 8 bits 34 Other Numeric Data • Character - Single 8-bit byte - 256 characters • ASCII is a 7 bit 128 character code • Unicode is a 16-bit character code (Java) • In C, a char variable is simply 8-bit integer numeric data 35 Enumerations • Motivation: Type for case analysis over a small number of symbolic values • Example: (Ada) Type DAYS is {Mon, Tues, Wed, Thu, Fri, Sat, Sun} • Implementation: Mon  0; … Sun  6 • Treated as ordered type (Mon < Wed) • In C, always implicitly coerced to integers 36 Pointers • A pointer type is a type in which the range of values consists of memory addresses and a special value, nil (or null) • Use of pointers to create arbitrary data structures 37 Pointer Data • Each pointer can point to an object of another data structure – Its l-value is its address; its r-value is the address of another object • Accessing r-value of r-value of pointer called dereferencing 38 Pointer Aliasing • A:= B – Numeric assignment A: B: A: B: 7.2 A: 0.4 B: 0.4 0.4 – Pointer assignment 7.2 0.4 A: B: 0.4 39 Problems with Pointers • Dangling Pointer A: B: Delete A A: B: A: B: 0.4 • Garbage (lost heap-dynamic variables) 7.2 0.4 7.2 0.4 40 Ways to Create Dangling Pointers int * A, B; A = new int; A = 5; B = A; delete A; /* B is still pointing to the address of object A returned to stack */ 41 Ways to Create Dangling Pointers int * A; int * sub () { int B; B = 5; return B;} main () { A = sub(); . . . } /* A has been assigned the address of an object that is out of scope */ 42 SML references • An alternative to allowing pointers directly • References in SML can be typed • … but they introduce some abnormalities 43 SML imperative constructs • SML reference cells – Different types for location and contents x : int y : int ref !y ref x non-assignable integer value location whose contents must be integer the contents of location y expression creating new cell initialized to x – SML assignment operator := applied to memory cell and new contents – Examples y := x+3 place value of x+3 in cell y; requires x:int y := !y + 3 add 3 to contents of y and store in location y 44 SML examples • Create cell and change contents val x = ref “Bob”; x := “Bill”; • Create cell and increment val y = ref 0; y := !y + 1; • While loop val i = ref 0; while !i < 10 do i := !i +1; !i; 45 Composite Data Types • Composite data types are sets of data objects built from data objects of other types • Elements called data structures • Some created by users, eg an array of integers • Some created internally by compiler, eg symbol table, or subroutine activation record 46 Specification of Structured Data Types • Number of components – Fixed or varying over life of data structure • Arrays and records have fixed number • Lists have variable number – If variable number of components, is there a max number possible 47 Specification of Structured Data Types • Type of each component –Homogeneous: all components have same type • Arrays –Heterogeneous: components have varying types • Records (also lists in some languages, but not SML) 48 Specification of Structured Data Types • Method of accessing components –Array subscripting –Record labels –SML datatype pattern matching 49 Operations on Data Structures • Creation and deletion of structures • Whole-structure operations –Assigning to variable –Iterating a function over the structure –Computing its length or size 50 Operations on Data Structures • Component selection operations – Direct access (aka random selection) • Takes constant time – Sequential selection • Usually proportional to some dimension of the structure (like the number of components) – May allow component update, or may only allow access to value 51 Operations on Data Structures • Component insertion and deletion – Applies to structures with variable number of components – Causes major effects on possible data layouts • Example seen in the layouts for strings 52 General Layout of Data Structures • Descriptor – Contains type information and other attributes of data structure – May only exist in symbol table at compile time, or may be a direct part of data object, or split between two – Usually several words long 53 General Layout of Data Structures • Layout of component data –Sequential: arrays and records • Uses least storage for structure if number of components fixed • Least flexible for overall storage management 54 General Layout of Data Structures • Layout of component data –Linked: lists, trees • Uses more space per structure since each component must also have a pointer to it • Maximum flexibility for overall storage management, put pieces where they fit 55 Strings • Character string is a data object composed of a sequence of characters • Main kinds: – Fixed declared length – Variable length with declared maximum length – Unbounded length 56 String operations • • • • String concatenation Length of string Substring selection by position Lexicographical ordering (based on underlying codes such as ASCII) • Substring by pattern matching 57 String Interface • Can be implemented as primitive type (as in SML or Java) or an array of characters (as in C and C++) • If primitive, operations are built in • If array of characters, string operations provided through a library 58 String Implementations • Fixed declared length (aka static length) –Packed array padded with blanks Descriptor String Length=12 Pointer to data Data A l l • a b o a r d ø ø 59 String Implementations • May need runtime descriptor for type, and length is substring operations include runtime checks • Update pads with blanks or truncates as necessary 60 String Implementations • Variable length with declared maximum (aka limited dynamic length) – Packed array with runtime descriptor String Max Length=12 Cur Length=10 Pointer to data A l l • a b o a r d 61 String Implementations • Descriptor may occur as initial block of data object for array 62 String Implementations • Unbounded length (aka dynamic length) – Two standard implementations – First: Linked list String Curr Length = 10 Pointer to data a b o a A l r d l • 63 String Implementations • Unbounded length – Second implementation: null terminated contiguous array String Pointer to data A l l •a b o a r d – Must reallocate and copy when string grows 64 Arrays • Ordered sequence of fixed number of objects all of the same type • Indexed by integer, subrange, or enumeration type, called subscript • Multidimensional arrays have one subscript per each dimension • L-value for array element given by accessing formula 65 Type Checking Arrays • • • • • Basic type – array Number of dimensions Type of components Type of subscript Range of subscript (must be done at runtime, if at all) 66 Array Layout • Assume one dimension 1 dim array Virtual Origin (VO) Lower Bound (LB) A[0]  A[LB] A[LB+1] Upper Bound (UB) Comp type Comp size (E) A[UB] 67 Array Component Access • Component access through subscripting, both for lookup (r-value) and for update (l-value) • Component access should take constant time (ie. looking up the 5th element takes same time as looking up 100th element) 68 Array Access Function • L-value of A[i] = VO + (E * i) =  + (E * (i – LB)) • Computed at compile time • VO =  - (E * LB) • More complicated for multiple dimensions 69 Records • Ordered sequence of fixed number of objects of differing types • Indexed by fixed identifiers called labels or fields • L-value for record element given by more complex accessing formula than for arrays 70 Typical Record Layout Descriptor Record type Num. of components Comp 1 label Comp 1 type Comp 1 location =  Comp n label Comp n type Comp n location Data R.1 R.2 R.n 71 Type Checking Record • Basic type – record • Number, name (label) of components • Possibly order of labels – If order matters, labels must be unique – If order doesn’t matter, layout must give a canonical ordering • Type of components per label 72 Record Layout • Most of descriptor exists only at compile time • Access function: • Comp i location given by i-1 • L-value of R.i =  +  (size of R.j) j=1 73 Lists • Ordered collection of variable number of elements –Many languages (LISP, Scheme, Prolog) allow heterogeneous list –SML has only homogeneous lists 74 Lists • Layout: linked series of cells (called cons cells) with descriptor, data and pointers –Data in first cell of list called head of list –R-value of pointer in first cell called tail of list 75 Lists • Sequential access of data by following pointers –Access is linear in position in list • Takes twice as long to look up 10th element as to look up 5th element 76 Lists • Adding a new element to list done only at head, called consing • Creates new cell with element to be added and pointer to old list (ie. creates new list) 77 List Layout • Example: [1,2.5,’a’] list list int 1 list real 2.5 char ‘a’ 78 List Layout • Example: [[1,2.5],[’a’]] list list int 1 real list list char ‘a’ 2.5 79 Union Types • Set-wise the (discriminated) union of the component types • Interchangeable with variant records as primitive type construct • Elements chosen from one of component types 80 Union Types • Problem: if int occurs as two different components of union type, can we tell which component an int is for? 81 Union Types • Two kinds of union types: –Free union - Ans: no –Discriminated union – Ans: yes • If each component is tagged to separate occurrences of same type, discriminated union, otherwise not 82 Union Layout Descriptor Union type Data Actual data Component type L Component tag Component location Unused space • No tag if free union • L is fixed length of biggest component 83 Combining Data Structures • Possible to have any of the above structures as components of others • Since lists are of variable size, but arrays must store fixed size element, how to store lists in an array? 84 Combining Data Structures • Answer: cons cells have uniform size, store just the leading cons cell 85 Example: • Data in 4-element array of lists int 5 list list int 6 list int int 3 list 1 list int 2 list int 7 86 Type symmary • Static type checking takes place after syntax check and before code generation • Some type checking can be necessary at run time • Types vs. Syntax • Simply typed values and composite values • User defined types • Equivalence on types 87

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download here