Download Data Structures — Lists - CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Transcript
Data Structures — Lists
1
Data Structures — Lists
Background: The functional and imperative styles also differ in their approaches to composite data. Functional languages support a clean and high-level
view of composite data structures, freeing the programmer from the details of
memory management and pointer manipulation.
Lists are the workhorse, the main data structure, in all functional languages.
LISP is a family of programming languages that have similar syntax and underlying philosophy, of which Scheme is a member. LISP means LISt Processing.
In this lab, we use lists almost exclusively. However, in Scheme list is (almost)
a subtype of pair; hence we first briefly consider pairs.
Pairs: Pairs can be used to represent coordinates in the plane, rational numbers,
and so on. To construct a pair from two values, use (cons hexpi hexpi); ‘cons’
stands for ‘construct’. Every two values can be combined into a pair, including
pairs and functions, and any expressions can be used.
(define
(define
(define
(define
fee
bee
gee
kee
(cons 1 2))
(cons fee 2))
(cons bee 5))
(cons + *))
In the LISP family, pairs are called ‘dotted pairs’. When you enter (cons 4
1) the value displayed is (4 . 1). This is a pair literal, also called an external
representation in Scheme terminology. (So are the sequences of digits that
represent numbers, such as 2, 3.14.) Pair literals can also be used in expressions
you type in, and in programs.
Pairs can be nested, so one can use pairs to construct elaborate structures.
Think of a pair created by (cons v1 v2) as a tree whose root is labeled with
cons but has no data, with left (right) child v1(v2). Each of the two can itself
be a pair, so the general picture is of a binary tree with data in the leaves. Each
internal node has two children, a leaf is a non-pair, a piece of data. Inside the
system, pairs are indeed represented as trees.
The first component of a pair can be selected by (car hpairi); the second,
by (cdr hpairi).1 For example, try (car gee), (cdr gee) The test pair?
tells you whether a value is of the type pair. .
Lists: A list is either the empty list, denoted (), or a pair, consisting of an
element, called the head of the list, and a list, called its tail. This can be
described in BNF by
List ::= () | (element . List),
1
See the blue book, p. 85, for explanation and history of these funny names.
Data Structures — Lists
2
where | stands for ‘or’. To construct a list from an element and a list use (cons
hexpi hexpi). The first exp is arbitrary, since there is no restriction on the
element, but the second must evaluate to a list; if not, the value obtained is
a legal pair but not a list. A list is obtained from its tail by adding the head
in front. A nonempty list is obtained from () by cons’ing values one after the
other.
A list can be viewed as a sequence, in which the first element is the one
cons’d last, the second is the one cons’d before last, and so on. This view is
emphasized by the external representation used in Scheme: Type in the following:
(define foo (cons 1 ()))
(define boo (cons 2 foo))
(define goo (cons #t boo))
As you can see, the system’s response does not look like dotted pairs. List
values (literals) are (externally) represented in Scheme as (v1 ... vn), emphasizing their view as sequences. Even if you type in a dotted pair, if it happens
to be a list, the special list representation will be displayed.2 Thus, if you work
only with lists, you never use or see dotted pairs.
The operations car, cdr are used to extract the head and tail, respectively,
of a nonempty list. Applying either of them to the empty list, (), causes a runtime error. Remember: tail is always a list. In particular, if you apply cdr to a
list that has two elements, you obtain a list that contains the second element,
not the element itself. The operation null? tests if a list is empty, and list?
tests if a given value is a list.
In summary:
Lists ⊂
Pairs ∪ {()}.
6=
We use only lists in this lab, until further notice. We have discussed pairs, to
give you sufficient information for discovering some logical errors in list programming. If your program contains (cons 4 3), the system will not complain—the
expression evaluates to a legal value, a pair; but you may have meant to construct a list! This kind of error is not caught by the system. So, remember, if
you see dots on the screen, probably there is an error in your program!
Instead of pairs, you can use 2-lists — lists that contain precisely two elements. E.g. you can use the list (2 3) instead of the pair (2 . 3).
A list may be heterogeneous, that is it may contain elements of different
types. Indeed, goo contains a boolean and numbers. Also, elements in a list
may be lists, as seen in the example below, where the first component is itself
a list.
2
The representation of nested pairs will often be a mixture of the two styles, since the
system tries to use the special list representation as much s possible. Try (cons bee bee).
Data Structures — Lists
3
(define foofoo (cons goo foo))
To access the second element of a list (assuming it has one), apply cdr to
obtain the tail, which is a list, then car to obtain the tail’s first element, as
is (car (cdr boo)) whose value is 1. Since these accesses are very common,
Scheme has abbreviations, as in (cadr boo), of length up to four. Try them!
List literals: (4 3 6) is a list literal. Can we type it in, as we type in the
number 3.4, or use it in expressions? Try typing it at the prompt! The reason
for what you see: There is a syntax clash here. Expressions and lists have the
form (e0 ... en ). Given such an input, the Scheme interpreter treats it like
an expression. But 4, the first element of the list (4 3 6), is not an operation
or function, hence the complaint.
You might believe this is a bug in the design of Scheme. Not so! It is a
feature, as will be clarified in due course.
The ambiguity is resolved as follows: A list literal in an input expression or
program must be in the following form: (quote (...)). Try
(quote (4 3 6))
The quote is not a regular operation. It is similar to a compiler directive in C. Its
purpose is to let the interpreter know that its argument is data, a literal, not an
expression. Scheme allows a short notation for it, as in ’(4 3 6). Remember,
this is just a short notation, expanded by the interpreter to (quote (4 3 6)).
You can quote any data: numbers, booleans, . . . . Where there is no ambiguity
anyway, it does nothing: ’34 is the same as 34. Lists and pairs must be quoted!
Note: A list literal has the form (v1 ... vn), not ’(v1 ... vn).
The interpreter evaluates expressions, and prints the results — always values.
Thus, in its output there is no confusion between list literals and expressions,
and quote is not used for list literals.
List Programming: Now, finally: (i) What are lists good for, and (ii) how do
we program with them? For (i), the answer is: For all your data storage and
manipulation needs. There is nothing else we use for that in this lab. Examples
will be given in the sequel. As for (ii), we show now some examples.
Suppose we want to write a program to count the number of elements in a
list. This is actually a built-in function, called length in Scheme, so we call our
version my-length. We define it using induction on the structure of the list.
Here is the specification, in pseudo-code (using standard mathematical notation
for function application):
my-length(ls) is
Data Structures — Lists
case ls of
()
cons(x, ls’)
4
=>
=>
0
add1(my-length(ls’))
This can easily be converted to Scheme code:
;; counts the elements in a list
(define my-length (lambda (ls) ;; list -> integer
(if (null? ls)
0
(add1 (my-length (cdr ls))))))
This program works for lists, independently of the types of the elements, even
for heterogeneous lists. A program in this form is often called a fold, since it
‘folds’ the elements of the list into one value. Another common name for this
general pattern is accumulate.
In evaluation of expressions, in our model, list operations are evaluated just
like other operations. That is, first the operand(s) are evaluated, then the
operation is applied in one step. As an example, we consider an application of
the function above to a list.
(my-length ’(3 4 5))
;; lookup my-length
→ ((lambda (ls) ... ) ’(3 4 5)) ;; substitute
→ (if (null? ’(3 4 5))
0
(add1 (my-length (cdr ’(3 4 5)))))
→ (if #f
0
(add1 (my-length (cdr ’(3 4 5)))))
→ (add1 (my-length (cdr ’(3 4 5))))
→+ 3
For clarity in our simulation, we keep the quotes on literal lists. In the first step
we look up the meaning of my-length in the environment. In the second, we
substitute the actual list parameter for all occurrences of the formal parameter
ls in the body of the function. Then we proceed to evaluate the resulting
expression.
Here is another list program. The prefix select follows the tradition in the
database area for such functions. We could have used filter as well.
Data Structures — Lists
5
;; selects the even numbers in a list of integers
(define select-even (lambda (ls)
;; list(integer) -> list(integer)
(if (null? ls)
()
(if (even? (car ls))
(cons (car ls) (select-even (cdr ls)))
(select-even (cdr ls))))))
This also uses induction on the list structure, but the treatment of the non-null
case is a bit more complex.
In summary, the universe of lists has an inductive structure, similar to the
natural numbers. Just as these are obtained from 0 by the successor function,
so are lists obtained from () by cons’ing elements. List programs are defined
by induction, and should contain a treatment of the cases where the input is
empty or nonempty.
For problems with two list parameters, or a list and an integer parameter, it
often (not always) the case that the solution uses induction on both parameters,
hence is more complex. As an example, here is a program that truncates a list
and leaves only the first n elements, n ≥ 0. If the list is shorter than n, the full
list is returned. We use induction on n, and secondary induction on the list:
;; selects the first n elements of list ls,
;; returns ls, if it contains less than n elements
take-some(n,ls) is ;; integer, list -> list
case n of
0
=> ()
k+1 => case ls of
()
=> ()
cons(x,ls’) => cons(x,take-some(k,ls’))
When converted to Scheme code, it has a nested if similar to the one in the
previous example. A better structure is obtained by using cond.
A few more list operations: (list hexp1 i...hexpn i) accepts any number
of arguments, and returns a list of their values. It is a convenient short notation
for (cons hexp1 i (cons ... (cons hexpn i)...)). The operation append
concatenates two lists into one. Beware of confusing cons with append. Their
difference is reflected in their types. Can you write their types? The operation
reverse returns its argument list, in reverse order. Finally, map accepts two
parameters, a function f and a list ls. Its result is the list obtained from ls by
applying f to each of its elements.
(map add1 ’(1 2 3)) →∗ (2 3 4)
Data Structures — Lists
6
Lab summary — Scheme constructs: cons, car, cdr, pair?, (), null?,
list?, list, append, reverse, map, quote (Of these, pair? is not for
use for now.)
Optional reading: Blue Book, pp. 85-86, then section 2.2 to p. 103. Beware,
the book presents pairs and lists as structures composed of cells and pointers.
Think of that as the implementation, not the programmer’s model.