Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS503: Fifth Lecture, Fall 2008 Recursion and Linked Lists. Michael Barnathan Here’s what we’ll be learning: • Theory: – – – – Recursion. Recursive data structures. The “Divide and Conquer” paradigm. Memoization (as in “memo” + “ization”). • Data Structures: – Linked Lists. • We are going to keep coming back to recursion throughout the semester. – But it should be easier for you each time we cover it. – We’ll stop covering it when you’re sufficiently familiar with it. Recursion: Definition • A function is recursive if it calls itself as a step in solving a problem. – Why would we want to do this? • A data structure is recursive if it can be defined in terms of a smaller version of itself. • Recursion is used when the problem can be broken down into smaller instances of the same problem. – Or “subproblems”. – These subproblems are more easily solved. • Often by breaking them into even smaller subproblems. • Of course, at some point, we have to stop. • The solutions to these subproblems are then merged into a solution for the whole problem. Recursive Structures: Example. • Before we discussed sorting, I asked you how you would sort a 3 element array. • We couldn’t figure that out immediately, so I asked how to do it on a 2-element array. – Compare the elements and swap. • Then I asked you how to extend that to a 3-element array. – Do two comparisons. • A size-n array can be recursively defined as a single element and an array of size n-1. – The sorting problem was easier to solve for small arrays. – By extending the (easier) problem from small to large, we came up with a general sorting algorithm (bubble sort). Recursion: Why? • Oftentimes, a problem can be solved by reducing it to a smaller version of itself. • If your solution is a function, you can call that function inside of itself to solve the smaller problems. • There are two components of a recursive solution: – The reduction: this generates a solution to the larger problem through solution of the smaller problems. – The base case: this solves the problem outright when it’s “small enough”. • The key: – You won’t be able to follow what is going on in every recursive call at once; don’t think of it this way. – Instead, think of a recursive function as a reduction procedure for reducing a problem to a smaller instance of itself. Only when the problem is very small do you attempt a direct solution. Example: • Using a loop, print all of the numbers from 1 to n. void printTo(int n) { for (int i =1; i <= n; i++) System.out.println(i); } • Now do it using recursion. – Stop and think before coding. Never rush into a recursive algorithm. – Problem: Print every number from 1 to n. – How can we break this problem up? • • Print every number from 1 to n-1, then print n. “Print n” is easy: System.out.println(n); – How can we print every number from 1 to n-1? – How about calling our solution with n-1? • This is going to call it with n-2… – – • This is going to call it with n-3... » … And print n-3. And print n-2. – And print n-1. • • And print n. And then we’re done! Right? Where does it end!? When does it end!? • This is a fundamental question when writing any recursive algorithm: where do we stop? • The easiest place to stop is when n < 1. – What do we do when n < 1? – Well, we already outputted all of the numbers from 1 to n. That was the goal. – So we do nothing. We “return;”. So here it is. void printTo(int n) { if (n < 1) //Base case. return; else { //Recursive step: 1 to n is 1 to n-1 and n. printTo(n-1); //Print 1 to n-1. System.out.println(n); //Print n. } } • Question: What if we printed n before calling printTo? • Don’t try to trace each call. Think about what this is doing. The Reduction: • We reduced the problem of printing 1 .. n to the problem of printing 1 .. n-1 and printing n. – A smaller instance of the same problem. – So we solved it using the same function. • Because we took one element off the end at a time, this is called tail recursion. • When it became “small enough” (n < 1), we solved it directly. – By stopping, since the output was already correct. – If we kept going, we’d print 0, -1, …, which would be incorrect. • That’s how recursion works. – But this is a simple example. • What’s the complexity of this algorithm? Splitting into pieces. • What we just saw was a “one piece” problem. – We reduced the problem to one smaller problem. • n -> (n-1). – These are the easiest to solve. – These solutions are usually linear. • What if we split the problem into two smaller problems at each step? – Say we wanted to find the nth number in the Fibonacci series. Recursive Fibonacci: • The Fibonacci series is a series in which each term is the sum of the two previous terms. – Recursive definition. • The first two terms are 1. – Base case (without this, we’d have a problem). • It looks like this: – 1, 1, 2, 3, 5, 8, 13, 21, 34, … • And here’s its function: – F(n) = F(n-1) + F(n-2) – F(1) = F(2) = 1 Fibonacci: two-piece recursion. • In order to find the nth Fibonacci number, we need to simply add the n-1th and n-2th Fibonacci numbers. • Ok, so here’s a Java function fib(int n): int fib(int n) { } • What would we write for the base case? Fibonacci base case int fib(int n) { if (n <= 2) return 1; } • That was simple. The recursive part? Fibonacci base case int fib(int n) { if (n <= 2) return 1; return fib(n-1) + fib(n-2); } • Ok, that wasn’t too bad. Solving multi-piece recursion. • Often you get a direct solution from the recursive call in one-piece recursion. • But when you split into more than one piece, you must often merge the solutions. – In the case of Fibonacci, we did this by adding. – Sometimes it will be more complex than this. • Recursion usually looks like this: – Call with smaller problems. – Stop at the base case. – Merge the subproblems as we go back up. Divide and Conquer • The practice of splitting algorithms up into manageable pieces, solving the pieces, and merging the solutions is called the “divide and conquer” algorithm paradigm. • It is a “top down” approach, in that you start with something big and split it into smaller pieces. • Recursion isn’t necessary for these algorithms, but it is often useful. Memoization • • • • What if we wanted to analyze fib? Well, there’s one call to fib(n)… Which makes two calls: fib(n-1) and fib(n-2)… Which makes four calls: fib(n-2), fib(n-3), fib(n-3), and fib(n-4)… • Which makes eight… • … • Uh oh. Why is it exponential? • So this is O(2^n). Not good. • And yet if you were to do it in a loop, you could do it in linear time. • But there’s something wrong with the way we’re calling that causes an exponential result. – There’s a lot of work being repeated. • We repeat n-2 twice, n-3 four times, n-4 eight… • In fact, this is the reason why it’s exponential! • Fortunately, we can reduce this. Memoization • “Memoization” (no “r”) is the practice of caching subproblem solutions in a table. • (Come to think of it, they could have left the “r” in and it would still be an accurate term). • So when we find fib(5), we store the result in a table. The next time it gets called, we just return the table value instead of recomputing. – So we save one call. What’s the big deal? – fib(5) calls fib(4) and fib(3), fib(4) calls fib(3) and fib(2), fib(3) calls… – You actually just saved an exponential number of calls by preventing fib(5) from running again. Implementing Memoization • • Use a member array for the table and wrap the actual work in a private function. The fib() function looks up the answer in the table and calls that function if it’s not found. class Fibonacci { private static int[] fibresults = new int[n+1]; //Or use a Vector for dynamic sizing. public int fib(int n) { if (fibresults[n] <= 0) //Not in table. fibresults[n] = fib_r(n); //Fill it in. return fibresults[n]; } private int fib_r(int n) { if (n <= 2) return 1; return fib(n-1) + fib(n-2); } } //This does the real work. //Base case. //Note that we call “fib”, not “fib_r”. The Gain • When we store the results of that extra work, this algorithm becomes linear. • Finding F(50) without memoization takes 1 minute and 15 seconds on rockhopper. • Finding F(50) with memoization takes 0.137s. • The space cost of storing the table was linear. – Because we’re storing for one variable, n. Optimal Solution to Fibonacci • The Fibonacci series has a closed form. • That means we can find F(n) in constant time: • F(n) = (phi^n - (-1/phi)^n) / sqrt(5). • Phi is the Golden Ratio, approx. 1.618. • It pays to research the problem you’re solving. Linked Lists • We said that arrays were: – Contiguous. – Homogenous. – Random access. • What if we drop the contiguousness? • That is, adjacent elements in the list are no longer adjacent in memory. • It turns out that you lose random access, but gain some other properties in return. Linked Lists • A linked list is simply a collection of elements in which each points to the next element. • For example: 1 2 3 • This is accomplished by storing a reference to the next node in each node: class Node<DataType> { public DataType data; public Node<DataType> next; } Variations • Doubly linked lists contain pointers to the next and previous nodes. The Java “LinkedList” class is doubly-linked. – This class has a similar interface to Vector. 1 2 3 • Circularly linked lists are linked lists in which the last element points back to the first: – Seldom used, usually for “one every x” problems. – To traverse one of these, stop when the next element is equal to where you started. 1 0 2 3 CRUD: Linked Lists. • • • • Insertion: Access: Updating an element: Deleting an element: • Search: • Merge: • Let’s start with access and insertion. ? ? ? ? ? ? Node Access • Elements are no longer contiguous in memory. • We can no longer jump to the ith element. • Now we have to start at the beginning and follow the reference to the next node i times. • Therefore, access is linear. • This is called sequential access. – Because every node must be visited in sequence. Access element 3: 1 2 3 Node Insertion • To insert into a list, all we need to do is change the “next” pointer of the node before it and point the new node to the one after. • This is a constant-time operation (provided we’re already in position to insert). • Example: Insert “5” after “1”: 1 5 2 3 Node1.next = new Node(5, Node1.next); Merging Two Lists • Linked lists have the unique property of permitting merges to be carried out in constant time (if you store the last node). • In an array, you’d need to allocate a large array and copy each of the two arrays to it. • In a list, you simply change the pointer before the target and the last pointer in the 2nd list. • Example: 1 2 3 1 2 Merge at “2”: 6 5 4 6 5 3 4 CRUD: Linked Lists. • • • • Insertion: Access: Updating an element: Deleting an element: • Search: • Merge: O(1) O(N) O(1) ? O(N). O(1). • Binary search will not work on sequential access data structures. – Moving around the array to find the middle is O(n), so we may as well use linear search. • Updating a node is as simple as changing its value. • That leaves deletion. Deletion • • • • • Insertion in reverse. Easier to do in a doubly-linked list. Store the next node from the target. Remove the target node. Set previous node’s next to the stored next. Delete “5”: 1 5 2 3 CRUD: Linked Lists. • • • • Insertion: Access: Updating an element: Deleting an element: • Search: • Merge: O(1) O(N) O(1) O(1) O(N). O(1). • Dynamically sized by nature. – Just stick a new node at the end. • Modifications are fast, but node access is the killer. – And you need to access the nodes before performing other operations on them. • Three main uses: – When search/access is not very important (e.g. logs, backups). – When you’re merging and deleting a lot. – When you need to iterate through the list sequentially anyway. Les Adieux, L’Absence, Le Retour • That was our first lecture on recursion. • There will be others - it’s an important topic. • The lesson: – Self-similarity is found everywhere in nature: trees, landscapes, rivers, and even organs exhibit it. Recursion is not a primary construct for arriving at solutions, but a method for analyzing these natural patterns. • Next class: Linked Lists 2, Stacks, and Queues. • Begin thinking about project topics.