Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COS 441: Final Exam January, 2006 This final should be the individual work of each student in the class. Please to not talk to anyone other than the professor (David Walker) or TA (Aquinas Hobor) about the questions on this midterm. Talking to anyone else about this exam between the 18th and 20th of January, 2006 constitutes a violation of Princeton's code of academic integrity. You may consult your lecture notes, or any of the textbooks listed as required or recommended on the course Web pages, but do not search for the answers on the general Web. If you need clarification on a question, please e-mail me or come to see me. Details: You must complete the exam in a 24-hour period. Write down the time you download the exam and the time you hand it at the top of the exam itself. You will work on the exam between the 18th and 19th of January, 2006. Students who requested a different time period in advance have been granted special consideration. Total points = 75 Reminders: Read questions completely before beginning your response. This exam has quite a lot of reading because part of the test is whether or not you are able to read and understand formal inference rules (typing rules and operational rules). Take your time reading the question. You should have plenty of time so do not feel the need to jump right in right away. Always state your proof methodology clearly and explicitly before writing down the details of a proof. Points are automatically deducted from anyone who does not do this. Points will also be deducted for proofs that are unclear or poorly structured. Always use the exact syntax of expressions, types, judgments, etc. that you are given in a question, or clearly define the abbreviations that you're using. When in doubt, avoid abbreviations. Definitely do not just start using some new, informal notation without defining it -- graders will not be able to figure out your intent. When defining a new type system, set of operational rules, etc., be sure to state the form of the new judgments in question before laying out the details of the rules (unless, of course, the question specifies the form of the judgment you should use). If some aspect of the question is confusing or underspecified and I am unavailable, make some reasonable assumption and write down the assumption clearly before beginning the question. Even if you can't come up with the correct solution for a question, write down as clear and concise an explanation of your thought process and partial work as you can to get partial credit. Do not leave a question entirely blank unless you know nothing about the topic. When writing out your proofs, use plenty of space (either electronically or on paper) to make them easy to read. One of the best ways is to format each line with one true statement (judgment) on the left and the justification for that statement on the right (in terms of earlier statements and inference rules, etc.). Good luck! 1. [20] Answer each question below concisely. Use no more than a sentence or two, a picture, a simple equation, a judgement or a typing rule, etc. a) Give one brief reason or situation where strongly typed programming languages like ML or Java have an advantage over “weakly” typed languages like C. b) Give one brief reason or situation where strongly typed programming languages like ML or Java have an advantage over languages that are not statically (i.e., compile-time) typed at all like Scheme or Lisp or scripting languages like Perl, Python, Ruby, PHP, and JavaScript. c) What is the difference between a covariant subtyping rule and a contravariant subtyping rule? d) When performing the type inference algorithm discussed in class, what part of the algorithm fails when you try to do type inference for this function: (fun id (x) = x x). Be as specific as possible. The best answer has this form: “During _______, ______ fails” e) In ML, we can write down the following data type definition: datatype tree = Leaf of int | Node of int * tree * tree Using only int, recursive, sum and product types, encode this type. f) Concisely describe the difference between an operational and a denotational semantics for a language. g) Why might one choose to write down an operational semantics using “evaluation contexts” as opposed to the more standard way we did things at the beginning of class. h) Is this a reasonable subtyping rule: ------------------------ (for any well-formed type t’) forall a. t <= t[t’/a] Yes or no? Why or why not -- briefly? Hint: it might help to think about a specific example where t is a function type. i) Is this a reasonable subtyping rule: ------------------------ (for any well-formed type t’) t[t’/a] <= forall a. t Yes or no? Why or why not? Is it more or less reasonable than the rule in question h? j) What’s your favorite type? 2. In class, we have studied type systems for high-level programming languages like C, Java and ML. However, it is possible to typecheck low-level languages, like those generated by compilers, to ensure they are safe as well. Java bytecode is an example of a relatively low-level, but type-safe language. However, it turns out that it is possible to develop type systems for even lower languages. In fact, even the assembly or machine language output of a compiler may be ascribed a safe type system! In this question, we will develop a simple type system for a very simple, idealized assembly language and prove it safe. Our simple assembly language will only allow programmers to compute with integers (n) and code pointers (l -- l is a meta-variable that ranges over the “locations” where code blocks are stored). Programs will do so by moving integers and code pointers in and out of registers (r) and jumping from one code block to the next. Here is a summary of the basic syntax of our assembly language: instruction operands v ::= n | l | r (integer or code pointer or register) instructions i ::= mov r v | add rd, rs, v | jz r, v (move operand v into register r) (add rs to v and put contents in rd) (conditional jump: if r is 0 then jump to v, else execute the following instruction) instruction sequences/ code blocks I ::= i; I (single instruction followed by a sequence) (unconditional jump to v) (halt execution, print contents of r, which must be an integer) |jv | halt r programs P ::= {l1: I1,...,lk: Ik} (a collection of code blocks I each associated with labels/code addresses l) Here is a simple example program (PROG1) that computes the product of registers r1 and r2, placing the final result in r3 before jumping to a return address assumed to be in r4: // Assume: // r1 : int, r2 : int // r4: return address // result produced in r3 prod: mov r3, 0 j loop // initialize result loop: jz r1, done add r3, r2, r3 add r1, r1, -1 j loop // if r1 = 0 then goto done // result = r2 + result // r1 = r1 - 1 done: j r4 Next, as usual, we will define the operational semantics for the language. In this case, we will specify execution of our machine using a triple (P, R, I) where P is the program being executed, R is a register file that assigns values to registers (see below), and I is the sequence of instructions to be executed next. Intuitively, I represents the “program counter”, but in an idealized way, so that execution of these programs looks a little bit more like execution of high-level expressions in the lambda calculus or MinML. Register files R ::= {r1 = v1, ..., rk = vk} (vi’s are not themselves registers. They may only be values: integers n or locations l) Where appropriate, we will use R and P as if they were functions from registers to values and code locations to instruction sequences respectively. R(ri) will be a value (provided ri does indeed contain a value in the current register file R) and P(li) will be an instruction sequence (provided li is indeed a code location in the current program P). Ie: {r1=v1,...,rk=vk} (ri) = vi (lookup contents of register ri in register file) {l1: I1,...,lk: Ik} (li) = Ii (lookup contents of code location li in program) For the sake of convenience, we write “R underlined” --- R(v) --- when v may be a register or may be some other non-register value like n or l. R(v) = n and R(l) = l. (ie: R does not really do anything in these two cases but using this notation helps us write down the operational semantics in a very clean and elegant style.) R(r) = R(r) (ie: extract the value of r from the register file R). One last operation we need is a register file update R[r1 = v] updates the contents of register r1. For example, the register file update {r1 = 3; r2 = 17}[r2 = 44] gives us this resulting register file: {r1 = 3; r2 = 44}. Using these operations, we define the operational semantics: P(R(v)) = I ---------------------------- (jump) (P, R, j v) ---> (P, R, I) ------------------------------------------------ (move) (P, R, mov r v; I) ---> (P, R[r = R(v)], I) R(r2) = n2 R(v) = n3 n1 = n2 + n3 ----------------------------------------------------- (add) (P, R, add r1, r2, v; I) ---> (P, R[r1 = n1], I) R(r) = 0 P(R(v)) = I2 ------------------------------------ (cond jump) (P, R, jz r, v; I) ---> (P, R, I2) R(r) ≠ 0 ----------------------------------- (cond fall thru) (P, R, jz r, v; I) ---> (P, R, I) Notice that programs can easily “get stuck” or “crash.” For example, in the add instruction, if r2 does not contain an integer value in R (maybe R does not even associate register r2 with anything at the current point in execution), the program will get stuck. Also, if v is some register, but that register does not contain an integer, the program will also get stuck. As a second example, consider execution of a jump instruction: (P, R, j l3). This machine will get stuck if l3 is not a label in the program P. (Note, this may not be exactly how a real machine actually works -- the real machine might continue computing for a little while before it does something terrible like seg faulting or trying to read from an illegal address and getting a bus error, but as we normally do, we model this by pretending the machine gets stuck right away.) Of course, we will be able to prevent the machine from ever getting stuck by type checking. Here are the types we will use: types t ::= int | code(G) (type of an integer) (type of a code pointer: in order to be allowed to jump to this code pointer, either directly or indirectly, the current register file must have type G) register file types G ::= {r1 : t1, ..., rk : tk} (registers r1,...,rk have types t1,...,tk respectively) whole-program type H ::= {l1: code(G1), ..., lk : code(Gk)} (code locations l1,...,lk hold code blocks with types code(G1) ... code(Gk) respectively) Here is a partial specification of the typing rules for these machines and their programs. You will have to fill in missing typing rules. Judgement 1: H |-- v : t -------------- (int) H |-- n : int (value, not register v has type t) H(l) = code(G) ----------------------- (loc) H |-- l : code(G) Judgement 2: H |-- R : G (register file has type G) For all r in the domain of G, H |-- R(r) : G(r) ------------------------------------------------------- (regfile) H |-- R : G (note: since we only type check the registers r in the domain of G -- NOT necessarily all r in the domain of R -- register files R may contain more things than appear in their type G) Judgement 3: |-- P : H (whole-program P has type H) P = {l1 = I1,...,rk = Ik} H = {l1 : code(G1), ..., lk : code(Gk)} H |-- I1 : code(G1) ... H |-- Ik : code(Gk) --------------------------------------------------------------------------- (prog) |-- P : H Judgement 4: H; G |-- v : t (operand v has type t) H |-- v : t --------------- (op-val) H; G |-- v : t G(r) = t --------------- (op-reg) H; G |-- r : t Judgment 5: H |-- i : G1 => G2 (instruction i requires input register file typed by G1 and after execution, produces register file typed by G2) H; G1 |-- v : t -------------------------------------- (mov) H |-- mov r, v : G1 => G1[r : t] (note: G1[r : t] updates the register file typing with r mapped to new type t; r could have had a completely different type before executing this move instruction.) (there are missing rules for the other instructions) Judgement 6: H |-- I : code(G) (this code block has type code(G); in other words, before jumping to or otherwise entering this code block, the register file must have type G) H; G1 |-- v : code(G2) G1 <= G2 ----------------------------------------------------------- (jump) H |-- j v : code(G1) H |-- i : G1 => G2 H |-- I : code(G2) ------------------------------------------------------ (I seq) H |-- i; I : code(G1) Judgement 7: G1 <= G2 (register files with type G1 are subtypes of register files with type G2) (missing rules; uses Judgement 8) Judgement 8: t1 <= t2 (t1 is a subtype of t2) (missing rules; uses Judgement 7) Judgement 9: |-- (P, R, I) ok (machine state (P, R, I) executes safely and does not get stuck) |-- P : H H |-- R : G H |-- I : code(G) -----------------------------------------------|-- (P, R, I) ok Questions follow [55 points]. Keep in mind that you do not have to do these questions in the order that they are given. If you find it more convenient you may certainly do them in the order that pleases you. a) [5] Implement the (abstract) syntax of machine states, programs, register files, etc in ML. It should be clear which ML definitions implement which sorts of things. In other words, implement the formal theory as directly as possible using datatypes where appropriate. Do not worrying about optimizing your representation in any way. b) [10] Implement the operational semantics of the assembly language. Implement the operational judgment as a function as directly as possible. c) [3] Define PROG2 in ML and execute it using your interpreter. What does it return? PROG2 = main: mov r1, 4 mov r2, 3 mov r4, exit j prod exit: halt r3 prod: mov r3, 0 j loop // initialize result loop: jz r1, done add r3, r2, r3 add r1, r1, -1 j loop // if r1 = 0 then goto done // result = r2 + result // r1 = r1 - 1 done: j r4 (you should also do your own testing) d) [5] Give the rest of the (sound) typing rules in judgement 5, one rule per instruction. You will have to prove your rules are sound in a second. If you find a mistake in your rules when you do your proof, of course you will come back and fix your answer. e) [5] Give the missing subtyping rules in judgements 7 and 8. Use an algorithmic subtyping definition as opposed to a declarative subtyping judgment. f) [5] Let G1 = {r1 : int; r2 : int; r3 : int; r4 : code({r3 : int})}. Let H1 = {prod : code(G1), loop : code(G1), done : code(G1)} Show that PROG1 (the example on page 2) has the type H1. In other words, give a full typing derivation for: |-- PROG1 : H1 g) [2] State an incorrect subtyping rule -- one that involves an incorrect variance (co-, contra- or in-) -- and demonstrate that this subtyping rule is incorrect by giving a program that type checks and also crashes due. h) [10] Assume the following lemmas are true (you don’t need to prove them but may use them in any later proofs): Lemma 1 [Register Lookup Typing] If |-- P : H, H |-- R : G and H; G |-- v : t then H; G |-- R(v) : t. Lemma 2 [Canonical Values] If |-- P : H and H |-- v : t then 1. If t = int then v = n for some n. 2. If t = code(G) then v = l for some l and l in Dom(H) and H |-- P(l) : code(G). Lemma 3 [Canonical Operands] If |-- P : H and H |-- R : G and H; G |-- v : t then 1. If t = int then R(v) = n for some n. 2. If t = code(G) then R(v) = l for some l and l in Dom(H) and H |-- P(l) : code(G). Lemma 4 [Register Update] If H |-- R : G and H |-- v : t then H |-- R[r = v] : G[r : t] Now, prove Progress: If |-- (P, R, I) then either 1. there exists R’ and I’ such that (P, R, I) ---> (P, R’, I’), or 2. I is halt r and R(r) = n for some integer n. If you need other lemmas to prove Progress, state those lemmas and prove them. Your proof will be graded both on correctness and on clarity and structure. i) [10] Assuming the lemmas given in part h, prove Preservation: If |-- (P, R, I) ok and (P, R, I) ---> (P, R’, I’) then |-- (P, R’, I’) ok Once again, if you need other lemmas state and prove them.