Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Structures — Lists 1 Data Structures — Lists Background: The functional and imperative styles also differ in their approaches to composite data. Functional languages support a clean and high-level view of composite data structures, freeing the programmer from the details of memory management and pointer manipulation. Lists are the workhorse, the main data structure, in all functional languages. LISP is a family of programming languages that have similar syntax and underlying philosophy, of which Scheme is a member. LISP means LISt Processing. In this lab, we use lists almost exclusively. However, in Scheme list is (almost) a subtype of pair; hence we first briefly consider pairs. Pairs: Pairs can be used to represent coordinates in the plane, rational numbers, and so on. To construct a pair from two values, use (cons hexpi hexpi); ‘cons’ stands for ‘construct’. Every two values can be combined into a pair, including pairs and functions, and any expressions can be used. (define (define (define (define fee bee gee kee (cons 1 2)) (cons fee 2)) (cons bee 5)) (cons + *)) In the LISP family, pairs are called ‘dotted pairs’. When you enter (cons 4 1) the value displayed is (4 . 1). This is a pair literal, also called an external representation in Scheme terminology. (So are the sequences of digits that represent numbers, such as 2, 3.14.) Pair literals can also be used in expressions you type in, and in programs. Pairs can be nested, so one can use pairs to construct elaborate structures. Think of a pair created by (cons v1 v2) as a tree whose root is labeled with cons but has no data, with left (right) child v1(v2). Each of the two can itself be a pair, so the general picture is of a binary tree with data in the leaves. Each internal node has two children, a leaf is a non-pair, a piece of data. Inside the system, pairs are indeed represented as trees. The first component of a pair can be selected by (car hpairi); the second, by (cdr hpairi).1 For example, try (car gee), (cdr gee) The test pair? tells you whether a value is of the type pair. . Lists: A list is either the empty list, denoted (), or a pair, consisting of an element, called the head of the list, and a list, called its tail. This can be described in BNF by List ::= () | (element . List), 1 See the blue book, p. 85, for explanation and history of these funny names. Data Structures — Lists 2 where | stands for ‘or’. To construct a list from an element and a list use (cons hexpi hexpi). The first exp is arbitrary, since there is no restriction on the element, but the second must evaluate to a list; if not, the value obtained is a legal pair but not a list. A list is obtained from its tail by adding the head in front. A nonempty list is obtained from () by cons’ing values one after the other. A list can be viewed as a sequence, in which the first element is the one cons’d last, the second is the one cons’d before last, and so on. This view is emphasized by the external representation used in Scheme: Type in the following: (define foo (cons 1 ())) (define boo (cons 2 foo)) (define goo (cons #t boo)) As you can see, the system’s response does not look like dotted pairs. List values (literals) are (externally) represented in Scheme as (v1 ... vn), emphasizing their view as sequences. Even if you type in a dotted pair, if it happens to be a list, the special list representation will be displayed.2 Thus, if you work only with lists, you never use or see dotted pairs. The operations car, cdr are used to extract the head and tail, respectively, of a nonempty list. Applying either of them to the empty list, (), causes a runtime error. Remember: tail is always a list. In particular, if you apply cdr to a list that has two elements, you obtain a list that contains the second element, not the element itself. The operation null? tests if a list is empty, and list? tests if a given value is a list. In summary: Lists ⊂ Pairs ∪ {()}. 6= We use only lists in this lab, until further notice. We have discussed pairs, to give you sufficient information for discovering some logical errors in list programming. If your program contains (cons 4 3), the system will not complain—the expression evaluates to a legal value, a pair; but you may have meant to construct a list! This kind of error is not caught by the system. So, remember, if you see dots on the screen, probably there is an error in your program! Instead of pairs, you can use 2-lists — lists that contain precisely two elements. E.g. you can use the list (2 3) instead of the pair (2 . 3). A list may be heterogeneous, that is it may contain elements of different types. Indeed, goo contains a boolean and numbers. Also, elements in a list may be lists, as seen in the example below, where the first component is itself a list. 2 The representation of nested pairs will often be a mixture of the two styles, since the system tries to use the special list representation as much s possible. Try (cons bee bee). Data Structures — Lists 3 (define foofoo (cons goo foo)) To access the second element of a list (assuming it has one), apply cdr to obtain the tail, which is a list, then car to obtain the tail’s first element, as is (car (cdr boo)) whose value is 1. Since these accesses are very common, Scheme has abbreviations, as in (cadr boo), of length up to four. Try them! List literals: (4 3 6) is a list literal. Can we type it in, as we type in the number 3.4, or use it in expressions? Try typing it at the prompt! The reason for what you see: There is a syntax clash here. Expressions and lists have the form (e0 ... en ). Given such an input, the Scheme interpreter treats it like an expression. But 4, the first element of the list (4 3 6), is not an operation or function, hence the complaint. You might believe this is a bug in the design of Scheme. Not so! It is a feature, as will be clarified in due course. The ambiguity is resolved as follows: A list literal in an input expression or program must be in the following form: (quote (...)). Try (quote (4 3 6)) The quote is not a regular operation. It is similar to a compiler directive in C. Its purpose is to let the interpreter know that its argument is data, a literal, not an expression. Scheme allows a short notation for it, as in ’(4 3 6). Remember, this is just a short notation, expanded by the interpreter to (quote (4 3 6)). You can quote any data: numbers, booleans, . . . . Where there is no ambiguity anyway, it does nothing: ’34 is the same as 34. Lists and pairs must be quoted! Note: A list literal has the form (v1 ... vn), not ’(v1 ... vn). The interpreter evaluates expressions, and prints the results — always values. Thus, in its output there is no confusion between list literals and expressions, and quote is not used for list literals. List Programming: Now, finally: (i) What are lists good for, and (ii) how do we program with them? For (i), the answer is: For all your data storage and manipulation needs. There is nothing else we use for that in this lab. Examples will be given in the sequel. As for (ii), we show now some examples. Suppose we want to write a program to count the number of elements in a list. This is actually a built-in function, called length in Scheme, so we call our version my-length. We define it using induction on the structure of the list. Here is the specification, in pseudo-code (using standard mathematical notation for function application): my-length(ls) is Data Structures — Lists case ls of () cons(x, ls’) 4 => => 0 add1(my-length(ls’)) This can easily be converted to Scheme code: ;; counts the elements in a list (define my-length (lambda (ls) ;; list -> integer (if (null? ls) 0 (add1 (my-length (cdr ls)))))) This program works for lists, independently of the types of the elements, even for heterogeneous lists. A program in this form is often called a fold, since it ‘folds’ the elements of the list into one value. Another common name for this general pattern is accumulate. In evaluation of expressions, in our model, list operations are evaluated just like other operations. That is, first the operand(s) are evaluated, then the operation is applied in one step. As an example, we consider an application of the function above to a list. (my-length ’(3 4 5)) ;; lookup my-length → ((lambda (ls) ... ) ’(3 4 5)) ;; substitute → (if (null? ’(3 4 5)) 0 (add1 (my-length (cdr ’(3 4 5))))) → (if #f 0 (add1 (my-length (cdr ’(3 4 5))))) → (add1 (my-length (cdr ’(3 4 5)))) →+ 3 For clarity in our simulation, we keep the quotes on literal lists. In the first step we look up the meaning of my-length in the environment. In the second, we substitute the actual list parameter for all occurrences of the formal parameter ls in the body of the function. Then we proceed to evaluate the resulting expression. Here is another list program. The prefix select follows the tradition in the database area for such functions. We could have used filter as well. Data Structures — Lists 5 ;; selects the even numbers in a list of integers (define select-even (lambda (ls) ;; list(integer) -> list(integer) (if (null? ls) () (if (even? (car ls)) (cons (car ls) (select-even (cdr ls))) (select-even (cdr ls)))))) This also uses induction on the list structure, but the treatment of the non-null case is a bit more complex. In summary, the universe of lists has an inductive structure, similar to the natural numbers. Just as these are obtained from 0 by the successor function, so are lists obtained from () by cons’ing elements. List programs are defined by induction, and should contain a treatment of the cases where the input is empty or nonempty. For problems with two list parameters, or a list and an integer parameter, it often (not always) the case that the solution uses induction on both parameters, hence is more complex. As an example, here is a program that truncates a list and leaves only the first n elements, n ≥ 0. If the list is shorter than n, the full list is returned. We use induction on n, and secondary induction on the list: ;; selects the first n elements of list ls, ;; returns ls, if it contains less than n elements take-some(n,ls) is ;; integer, list -> list case n of 0 => () k+1 => case ls of () => () cons(x,ls’) => cons(x,take-some(k,ls’)) When converted to Scheme code, it has a nested if similar to the one in the previous example. A better structure is obtained by using cond. A few more list operations: (list hexp1 i...hexpn i) accepts any number of arguments, and returns a list of their values. It is a convenient short notation for (cons hexp1 i (cons ... (cons hexpn i)...)). The operation append concatenates two lists into one. Beware of confusing cons with append. Their difference is reflected in their types. Can you write their types? The operation reverse returns its argument list, in reverse order. Finally, map accepts two parameters, a function f and a list ls. Its result is the list obtained from ls by applying f to each of its elements. (map add1 ’(1 2 3)) →∗ (2 3 4) Data Structures — Lists 6 Lab summary — Scheme constructs: cons, car, cdr, pair?, (), null?, list?, list, append, reverse, map, quote (Of these, pair? is not for use for now.) Optional reading: Blue Book, pp. 85-86, then section 2.2 to p. 103. Beware, the book presents pairs and lists as structures composed of cells and pointers. Think of that as the implementation, not the programmer’s model.