Download CS503: First Lecture, Fall 2008

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

B-tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
CS503: Eighth Lecture, Fall 2008
Binary Trees
Michael Barnathan
Project Ideas
•
•
General idea: something that won’t take more than a couple of weeks but that you
can use in a portfolio. These are just my ideas; feel free to come up with your own.
Web crawler.
– Sockets, Trees, Recursion.
– Caveats: dead links, status codes, redirects.
•
Fast file indexer based on frequent words.
– File I/O, Trees, Analysis of Algorithms.
– Compression (storing the index).
•
Works well because of Zipf/power law distributions.
– Caveats: binary files, errors opening files. Make sure you open read-only.
•
Trend analysis tool using regression.
– Sorting, Recursion, Analysis.
– Predictors: I can help you with these.
•
AI opponent for a simple game (such as checkers or Reversi)
– Trees, Storage, Recursion, Heuristic Search, Analysis of Algorithms.
•
Distributed command server (to order around a bunch of machines).
– Sockets, File I/O, Priority Queues (especially for synchronization), Analysis of Algorithms.
Grading
• Not having exams is going to require adjusting
the percentages a bit.
• Assignments 40%, Project 30%, Labs 20%,
Participation 10%?
Here’s what we’ll be learning:
• Data structures:
– Binary Trees.
– Binary Search Trees.
• Theory:
– Tree traversals.
• Preorder.
• Inorder.
• Postorder.
– Balanced and complete trees.
– Recursion on Binary Trees.
Linear Structures
• Arrays, Linked Lists, Stacks, and Queues are linear
data structures.
– Even circularly linked lists.
• One element follows another: there is always just
one “next” element.
• As we mentioned, these usually yield recurrences
of the form T(n) = T(n-1) + f(n).
• What if we violate this assumption? What if a
structure had two next elements?
– In a way, we started with the most restrictive
structure (arrays), which we are progressively relaxing.
Frankenstein’s Data Structures
Data
10
21
44
Index
0
1
2
Data
5
10
Index
1
2
Data
42
Index
2
It hardly even makes sense to talk
about a node having more than one
successor in an array. What would
data[2] be here? It’s not clear.
4
2
1
Linked lists make more sense, but
what is 1->next now? We need
more information.
5
3
6
7
Binary Trees
• There are now two next nodes, not one.
• Clearly, we need two pointers to model them.
• So let’s rotate that list…
1
Left
Right
3
2
4
5
6
7
• Let’s call the pointers “left” and “right”.
• This structure is known as a binary tree.
– Binary: branches into two nodes.
• Higher-order trees exist too; we’ll talk about these later.
– Why we call it a tree should be obvious.
Nomenclature
• Some from botany, some from genealogy.
• Root: The “highest” node in the tree; that is, the one without a
parent.
– Almost all tree algorithms start at the root.
• Child/Subtree: A node one level below the current node. Traversing
the left or right pointers will bring you to a node’s “left” or “right”
child.
• Leaf: A node at the “bottom” of the tree; i.e. one without children
(or really with two null children).
• Parent: The node one level above.
• Siblings: Nodes with the same parent.
• “Complete” tree: a tree where every node has either 0 or 2 children
and all leaves are at the same level. (Basically, it’s fully “filled in”).
Recursive Definition
• Binary trees have a nice recursive definition:
• A binary tree is a value, a left binary tree, and
a right binary tree.
• Thus, each individual node is itself a tree.
• Base case: the empty tree.
– Leaves’ left and right children are both empty.
– We usually represent this with nulls.
Node Access
• All you must store is a reference to the root.
• You can get to the rest of the nodes by
traversing the tree.
• Example: Accessing 5.
Root
1
• There’s a problem.
3
2
– Anyone see it?
4
5
6
7
Traversal
• We have no way of knowing where 5 is.
• In order to find it, we need to check every node in the tree.
• So what’s the complexity of access?
– “Check every” should ring alarm bells by now.
• This is a nonlinear data structure, so we have more than one way to
traverse, however.
• There are three common tree traversals:
– Preorder, inorder, and postorder.
• There are some more exotic ones, too: traversals based on pointer
inversion, threaded traversal, Robson traversal…
• Since binary trees are recursive structures, tree algorithms are usually
recursive. Traversals are no exception.
• Remember how we reversed the output of printTo(n) by moving the
output above or below the recursive call?
– It turns out you can change the order of the traversal in the same way.
Traversals and “Visiting”
• You can do anything to a
node inside of a tree
traversal algorithm!
• You certainly can search for a value.
• But you can also output its value, modify the its
value, insert a node there, etc.
• This generic action is simply called “visiting” the
node when discussing traversals.
Preorder Traversal
• If I gave you a binary tree and asked you to search for an element,
how would you go about it?
– You would check the value.
– You would search the left subtree.
– You would search the right subtree.
• This is how preorder traversal works.
– We check/output/do something with the node value.
– We recurse on the left subtree.
– We recurse on the right subtree.
• We stop once we run out of nodes.
• This is also called depth-first traversal, because it first focuses on
traversing down specific nodes before broadly visiting others.
– Like taking a lot of CS courses first before satisfying a core curriculum.
– Preorder traversal is a special case of depth-first search on data
structures called graphs, which we will discuss soon.
Preorder Traversal
BinaryTree preorder(BinaryTree node, int targetvalue) {
if (node == null)
//Base case.
return null;
else if (node.value == targetvalue)
//Found it.
return node;
}
BinaryTree lchild = preorder(node.left);
if (lchild != null)
return lchild;
//Traverse left.
return preorder(node.right);
//Traverse right.
//Found on the left.
Preorder Traversal: Illustration
Root
1
3
2
4
5
6
7
Order: [1 2 4 5 3 6 7]
The root is always the first node to be visited.
Inorder Traversal
• We have two recursive calls in the preorder traversal: left
and right.
• In preorder, we checked the node before calling either of
them.
• In an inorder traversal, we check in-between the two calls.
• We dive down all the way on the left before outputting,
then we visit the right.
– To use recursive stack language, we output after popping on the
left but before pushing on the right.
• There is no inherent advantage to choosing one traversal
over another on a regular binary tree unless you
deliberately want a certain ordering.
• However, inorder traversal is important on a variation of
the binary tree. More on that in just a moment.
Inorder Traversal
BinaryTree inorder(BinaryTree node, int targetvalue) {
if (node == null)
//Base case.
return null;
//All we did was swap the order of these two lines.
BinaryTree lchild = inorder(node.left);
//Traverse left.
if (node.value == targetvalue)
//Found it.
return node;
}
if (lchild != null)
return lchild;
//Found on the left.
return inorder(node.right);
//Traverse right.
Inorder Traversal: Illustration
Root
1
3
2
4
5
6
7
Order: [4 2 5 1 6 3 7]
The root is always the middle node.
Postorder Traversal
• The obvious next step: output after both
recursive calls.
• This causes the algorithm to dive down to the
bottom of the tree and output/visit the node
when going back up.
– Similar to what we did in printTo(), actually.
– We are outputting on the pop.
Postorder Traversal
BinaryTree postorder(BinaryTree node, int targetvalue) {
if (node == null)
//Base case.
return null;
BinaryTree lchild = postorder(node.left);
BinaryTree rchild = postorder(node.right);
if (node.value == targetvalue)
return node;
if (lchild != null)
return lchild;
//Found on the right or not at all.
return rchild;
}
//Traverse left.
//Traverse right.
//Found it.
//Found on the left.
Postorder Traversal: Illustration
Root
1
3
2
4
5
6
7
Order: [4 5 2 6 7 3 1]
The root is always the last node to be visited.
CRUD: Binary Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search/Traversal:
?
?
?
?
O(n).
• All three traversals are linear: They visit every node in sequence.
– They each just follow different sequences.
– You can search by traversing, so search is also O(n).
• How long would it take to access a node, though?
– If I knew I wanted the left child’s left child, how many pointers would I
need to follow to get to it?
Tree Height.
• To analyze worst-case access, we need to talk about tree
height.
• The height of a tree is the number of vertical levels it
contains, not including the root level.
• Or you can think of it as the number of times you’d have to
traverse down the tree to get from the root to the lowest
leaf node.
• Nodes in the tree are said to have a depth, based on how
many vertical levels they are down from the root.
–
–
–
–
The root itself has a depth of 0.
The root’s children have a depth of 1.
Their children have a depth of 2…
Etc.
• The height is thus also the depth of the lowest node.
Height
Root
1
Depth = 0
3
2
4
5
6
Depth = 1
7
Height = 2
Remember, don’t count the root level.
Depth = 2
Height Balance
• A tree is considered balanced (or height-balanced) if the
depth of the highest and lowest leaves differs by no more
than 1.
• This turns out to be an important property because it forms
a lower bound on the access time of the tree and lets us
find the height.
• Question: If we have n nodes in a balanced binary tree,
what is the height of the tree?
– floor(log2 n)
– Note that we had 7 nodes in the previous tree, but a height of 2.
The tree was full; adding an 8th node would take the height to 3.
• The time to access a node depends on the height, thus we
know it is O(log n) on a balanced tree.
Degeneracy
• Trees with only left or right pointers degenerate into
linked lists.
– Which gives you another perspective on why Quicksort
became quadratic with everything on one side of the pivot.
– Access on linked lists is O(n).
• Performance gets worse even as we approach this
condition, so we want to keep trees balanced.
1
2
3
1
2
3
CRUD: Balanced Binary Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search/Traversal:
?
O(log n).
?
?
O(n).
• Once we know where to insert, insertion is simple.
– Just add a new leaf there: O(1).
• However, discovering where to insert is a bit trickier.
– Anywhere that a null child used to be will work.
– We don’t want to upset the balance of the tree.
– A good strategy is to traverse down the tree based on the value of
each node. This creates a partitioning at each level.
Binary Tree Insertion
void insert(BinaryTree root, BinaryTree newtree) {
//This can only happen now if the user passes in an empty tree.
if (root == null)
root = newtree;
//Empty. Insert the root.
else if (newtree.value < root.value) {
//Go left if <.
if (root.left == null)
//Found a place to insert.
root.left = newtree;
else
insert(root.left, newtree);
//Keep traversing.
}
else {
//Go right if >=.
if (root.right == null)
root.right = newtree;
//Found a place to insert.
else
insert(root.right, newtree);
//Keep traversing.
}
}
Insertion Analysis
• This is similar to a traversal, but guided by the
value of the node.
• We choose left or right based on whether the
node is < or >=.
• We split into one subproblem of size n/2 each
time we traverse.
– What recurrence would we have for this?
– What would be the solution?
CRUD: Balanced Binary Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search/Traversal:
O(log n).
O(log n).
O(1).
?
O(n).
• If we’re already at the element we need to update, we can just change the
value, thus O(1).
– Note that we can say the same for insertion, but finding a place to put the
node is usually considered part of it.
• Deletion is quite complex, on the other hand.
– If there are no children, just remove the node – O(1).
– If there is one child, just replace the node with its child – O(1).
– If there are two… well, that’s the tricky case.
Deletion
• If we need to delete a node with two children, we
need to find a suitable node to replace it with.
• One good choice is the inorder successor of the
node, which will be the leftmost leaf of the right
child we’re deleting from.
– Inorder successor meaning the next node in an
inorder traversal.
• So our course is clear: inorder traverse, stop at
the next node, swap.
Deletion
void deleteWithTwoSubtrees(BinaryTree targetnode) {
if (targetnode == null)
return;
//Deleting a null is a no-op.
//Find the inorder successor and its parent.
BinaryTree inorder_succ;
BinaryTree inorder_parent = targetnode;
for (inorder_succ = targetnode.right; inorder_succ.left != null; inorder_succ =
inorder_succ.left)
inorder_parent = inorder_succ;
//Keep track of the parent.
//Set the value of the parent to that of the inorder successor…
targetnode.value = inorder_succ.value;
//Delete the inorder successor (here’s why we needed the parent):
inorder_parent.left = null;
}
CRUD: Balanced Binary Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search/Traversal:
O(log n).
O(log n).
O(1).
O(log n).
O(n).
• Finding the inorder successor requires time
proportional to the height of the tree. If the tree is
balanced, this is O(log n).
CRUD: Unbalanced Binary Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search/Traversal:
O(n).
O(n).
O(1).
O(1).
O(n).
• The worst sort of unbalanced tree is just a linked list.
• The deletion algorithm would always hit the second case (only one
child), so we’d never experience O(log n) behavior…
• But the insertion algorithm is not as efficient as that of a linked list.
– Unless we check for this condition explicitly, in which case we get O(1).
Binary Search Trees
• Binary Search Trees (BSTs) capture the notion
of “splitting into two”.
• Or, to use the Quicksort term, partitioning.
– The value of a node is the pivot.
– The left tree contains elements < the pivot.
– The right tree contains elements >= the pivot.
• They are simply binary trees that are kept
sorted in the manner stated above.
BSTs: What do they entail?
• Like priority queues and sorted arrays, binary search
trees are inherently sorted containers.
• This means inserting a sequence of elements and then
reading them back will get them out in sorted order.
– Ah, but this time we have three ways to read them back
out. All three can’t give us the same order.
– The elements of an inorder traversal are sorted in binary
search trees.
• It also means that we’ll have to do some extra work to
ensure that this guarantee is true.
– But will this work influence the asymptotic performance?
A Binary Search Tree
Root
4
6
2
1
3
5
7
Inorder traversal: [1 2 3 4 5 6 7]
< on the left, >= on the right.
CRUD: Binary Search Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
O(log n).
O(log n).
?
O(log n).
•
•
Search:
Traversal:
O(log n).
O(n).
•
Search and traversal are no longer the same operation!
– Traversal is analogous to linear search: look at every element, one at a time, and try to find the
target.
– Search on a BST is analogous to binary search: the data is sorted around the value of the node
we’re at, so it guides us to eliminate half of the remaining elements at each step.
– Just like other unsorted containers, we have to traverse to search a standard binary tree. And
like other sorted containers, a BST lets us do a binary search.
•
Remember, BSTs are sorted in an inorder traversal.
– Therefore, the deletion algorithm we previously specified will preserve the ordering.
Access on a BST
• Use the same strategy we used in binary search:
– Compare the node.
– If the target value is less than the node’s value, go left
(eliminates the right subtree).
– If the target value is greater than the node’s value, go
right (eliminates the left subtree).
– If it’s equal, we’ve found the target.
– If we hit a NULL, the target isn’t in the tree.
• This exhibits the same performance: O(log n).
– If the tree is balanced. In the degenerate case, we are
binary searching a linked list, which is O(n).
Insertion on a BST
• The algorithm I gave you for insertion was
actually the BST insertion algorithm as well.
– That was one of the reasons why I chose that strategy,
although it does result in a fairly balanced tree if the
data distribution is uniform.
• In order to keep elements partitioned around the
pivot, we need to traverse left when the new
element has a value < the pivot and right when
it’s >=.
• It was O(log n) before, and it still is.
Deletion on a BST
• I also gave you the BST deletion algorithm.
• As the inorder traversal is in sorted order, the
inorder successor is the next element after the
one we’re deleting in sorted order.
• If we replace the element we’re deleting with
the next element in the sequence, the
sequence is still sorted.
– e.g., [ 1 3 5 8 13 ] after deleting 3 -> [ 1 5 8 13].
• It was O(log n) before, and it still is.
Updating a BST
• Ah, here’s something different.
• Updating unsorted containers is usually a constant-time operation,
while updating sorted containers usually takes longer.
• When we change the value of a node in a BST, we may be required
to change the node’s position in the tree to preserve the ordering.
– This is why updating sorted containers is usually a slow operation.
• No one seems to want to deal with updating these, so most sources
(including your textbook) just define it as “delete and reinsert”.
– Which fully works and is very simple to do.
– Don’t be afraid to do “quick and dirty” things if they don’t harm your
performance.
• So does this harm performance?
– Insertion is O(log n).
– Deletion is O(log n).
– Unless we can update in O(1) on a BST (we can’t), then no.
CRUD: Binary Search Trees.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
O(log n).
O(log n).
O(log n).
O(log n).
•
•
Search:
Traversal:
O(log n).
O(n).
•
This is the ultimate compromise data structure.
– Arrays, Lists, Stacks, and Queues all did some things in constant time and other things in linear
time.
– This does everything (except traversal, which is inherently a linear operation) in logarithmic
time.
– But remember, logarithmic time isn’t much worse than constant.
– So these are pretty good data structures.
•
As usual, there’s a catch…
The Important of Balance
• Every operation on a tree begins to degenerate when balance is
lost.
– And in the worst case, you end up with a less efficient linked list.
• Keeping the tree balanced is thus important.
– There is one who is prophesized to bring balance to the Force, but I
don’t think that includes your trees.
– So the burden falls on you, my young padawan.
• Since BSTs are the structural analogue of Quicksort, you may have
an idea of what insertion sequence will produce the worst case.
– Yep, sorted or inverse-sorted, just as in Quicksort.
• Most data is not arranged like this already, and on average, BSTs
stay fairly well balanced.
• But this is enough of a problem where various self-balancing
structures have been invented. We will discuss these next week.
A General Note
• Although I put numbers in most of my
examples, any sort of data can go in these.
– Strings, Objects, Employees.
• Caveat: When using Java’s sorted containers,
make sure your class implements Comparable.
– Java doesn’t give you a BinaryTree class outright,
but it does give you TreeSet and TreeMap.
– TreeMap in particular is very neat; check it out.
– We’ll do some things with these in Thursday’s lab.
“In all my life I’ll never see / a thing so beautiful as a tree.”
• The study of trees goes very deep.
– We’ve just scratched the surface.
– We’ll come back to self-balancing trees, heaps,
and perhaps splay trees.
• The lesson:
– Ideas are universal. They can come from your
study. They can come from outside of your study.
They can come from nature. They can come from
anywhere.
• Next class: Linear-time sorting, B+ trees, lab.