Download Binary Search Trees A Generic Tree Binary Trees

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Binary Search Trees
A Generic Tree
Nodes in a binary search tree ( B-S-T) are of the form
P – parent
A
Key
Satellite data
L
R
B
F
The B-S-T has a root node which is the only node
whose parent pointer is NIL .
I
J
L
1.Is this binary tree complete?
Why not?
(C has just one child, right side is much
deeper than left)
Binary tree is
– a root
– left subtree (maybe empty)
– right subtree (maybe empty)
A
B
2. What’s the maximum # of leaves a binary
tree of depth d/height h can have? 2d
A
C
D
Properties
B
D
E
F
G
H
Data
I
J
E
3.What’s the max # of nodes a binary tree of
depth d/height h can have? 2 d + 1 - 1
Minimum? 2d-1 + 1 ; 2d
C
Representation:
left
right
pointer pointer
H
G
E
D
K
Binary Trees
– max # of leaves:
– max # of nodes:
– average height for N nodes:
C
4.We won’t go into this, but if you take N
nodes and assume all distinct trees of the
nodes are equally likely, you get an average
depth/height of SQRT(N).
Is that bigger or smaller than log n?
Bigger, so it’s not good enough! We will see
we need to impose structure to get the bounds
we want
F
G
I
H
J
Implementations of Binary Trees
Representation
A representation of a binary tree data node is similar to a
doubly linked list in that it has two pointers.
A
A
left right
pointer pointer
B
B
C
left right
pointer pointer
left right
pointer pointer
D
E
F
left right
pointer pointer
left right
pointer pointer
left right
pointer pointer
D
C
E
The graphic of a node shows a data area, and a left and
right pointer. Some trees only store data in the leaf
nodes.
F
In this tree of five nodes, the following properties can be seen:
Here is an example of a union binary tree implementation
where the internal nodes are of a different construct
than the leaf nodes.
The internal nodes contain pointers, and a data
element representing a mathematical operator.
The leaf nodes are data nodes only, and contain
the operands.
4 * x * (2 * x + a) - c
there are 5 nodes
• 3 internal
• 2 leaf nodes
• there are a total of 10 pointers
• 6 of the pointers are null pointers
For the above equation:
• Post-order traversal of the above tree will regenerate the post-fix notation
of the equation.
• Pre-order traversal regenerates prefix notation
• In-order traversal regenerates the form depicted to the left of the tree
Binary Search Tree
Dictionary Data Structure
Every BST satisfies the BST property
Binary tree property
– each node has ≤ 2 children
– result:
• storage is small
• operations are simple
• average depth is small
– normally
5
2
Search tree property
i. If y is in the LEFT subtree of x then
key [ y ] < key [x ]
8
– all keys in left subtree smaller
than root’s key
– all keys in right subtree larger
than root’s key
– result:
• easy to find any given key
11
6
4
10
7
9
14
13
15
8
4
8
7
3
This property ensures that data in a B-S-T are stored in
such a way as to satisfy the B-S-T property.
Examples
Examples
1
ii. if y is in the RIGHT subtree of x then
key [ y ] > key [ x ] .
12
15
5
11
2
7
11
6
4
NEITHER IS A
BINARY SEARCH TREE
10
15
8
4
18
20
21
1
8
7
3
5
11
2
7
11
6
4
NEITHER IS A
BINARY SEARCH TREE
10
15
18
20
21
Binary Search Trees ( BST)
Run Times of BST algorithms depend on the shape of the
tree
The defining property of BST is that each node has left and
right links pointing to another binary search tree or to
external nodes ( which have no non-NIL links).
Best Case: Tree is perfectly balanced Î ~ log n nodes
from root to the bottom
Î Compare key values in internal nodes with the search
key and use result to control progress of the search.
Insert ASERCGHIN into an initially empty BST
Î Notice that each insertion follows a search miss at the
bottom of the tree.
Î Insertion is as easy to implement as Search.
Sorting
If look at BST in proper manner, it represents a sorted file
i.e. read the tree from left to right, ignoring the level
(height) of the nodes in the tree
i.e. an In Order traversal of the tree
( left subtree => root => right subtree )
BST's are a dual model to quicksort :
Node at root corresponds to the pivot element
Insert { A S E R A H C G I E N X M P E A L }
into an empty BST
Worst Case : Could be n nodes from root to the bottom.
Searches on BST : On average require about 2 log n
comparisons on a tree with n nodes.
Proof: # of compares = 1 + distance of node to the root
Adding over all nodes gives internal path length
If C n = average internal path length of BST with n
nodes,
Traversals
Many algorithms involve walking through a tree, and
performing some computation at each node
Walking through a tree is called a traversal
Common kinds of traversal
ƒ Pre-order
ƒ Post-order
ƒ Level-order
Consider the following pseudocode :
InOrder_Traversal ( x)
If x = Nil
Then InOrder_Traversal( Left [ x ])
Print key [x ]
InOrder_Traversal ( Right [ x ] )
What is printed if this is applied to the B-S-T in the
graph ?
How long does the tree traversal take ?
O ( n ) - time for a tree with n items
Visiting each node once and printing the value
An InOrder_Traversal prints the node values in monotonically
increasing order.
Operations on a BST
Searching :
= NIL
In Order Listing
10
5
15
2
9
7
20
17 30
In order listing:
2→5→7→9→10→15→17→20→30
Find D in the preceding B-S-T :
What happens if search for C ?
Maximum and Minimum : Very straightforward from the
structure of the B-S-T
10
Tree_ Minimum ( x)
While left [ x ] not null
Do x
left [ x ]
5
15
Return x
Tree_ Maximum( x)
While right [ x ] not null
Do x
right [ x ]
Return x
2
9
7
20
17 30
How long does each procedure take to run ?
O ( h ) where h = height of the tree .
Just traveling down the tree one level at a time.
Successor and Predecessor :
If all keys are distinct, then the successor of a
node x is the node with the smallest key greater
than the key [x]
If all keys are distinct, then the predecessor of a
node x is the node with the largest key less than
the key [x]
Successor and Predecessor :
The structure of the B-S-T allows determination of the successor without any
comparison of keys :
Tree_Successor (x)
1. If right [ x ] not null
2. Then return Tree_Minimum ( right [ x ])
3. y
p[x]
4. while y not null and x = right [ y ]
do x
y
5. y
p[y]
6. return y
What is happening in the situation when the key has no
right subtree ?
In this case , if x has a successor then it is the lowest ancestor
of x whose left child is also an ancestor of x.
Î to find the successor , in this case, move up the tree from x
until find a node that is the left child of its parent .
1. Find successor of 15
right [ x ] is not null
so execute a call to Extract_Min on right [ 15 ]
points to 18 and returns 17 = x.
2. Find successor to 13
i. y gets p[x] and points to 7
node
ii. y not null and x = right [ y ]
iii. x2 set to point to 7 node ;
y2 set to point to 6 node
iv. y2 not null and
x2 = right [ y2 ]
v. x3 set to point to 6 node ;
y3 set to point to 15 node
vi. y3 is not null and
x3 = right [y3 ]
vii. return y3
Îas long as move left up the subtree , we visit smaller keys
Î our successor is the node of which we are the predecessor
What is the running time ?
In either case – follow path up the tree or down the tree
(and only one of these paths)
Î O ( h ) run time.
What would code look like for the predecessor of x ?
Theorem : The dynamic set operations :
Search, Minimum, Maximum, Successor and
Predecessor
can run in O ( h ) time on a B-S-T of height h.
Idea behind Insertion
1. goal of the algorithm is to find a place to insert a new
node
2. similar to the search code but with a few twists
3. as you go keep two pointers :
one to where you are ; one to where you have been
( to allow for a quick connection)
4. trace a path from the root to a null this locates where the
node will go
5. what if there is no tree ? set this “new” node to be the
root
What if the input string is : B D F H J L and no tree exists at
first insert ?
Insertion and Deletion
These operations cause the dynamic set represented by the B-S-T to
change. Changes are made so that the B-S-T property is preserved.
Insertion :
Deleting a Node
1. If the node is an external node , simply replace it with a
NIL value
2. If it is an internal node, then it has 1 or 2 children that
cannot simply be orphaned – they need to be reattached
to the BST tree while preserving the BST property.
Case 1 : node has one child :
Replace the node with the value ( key) of its child
Begin at the root and trace a path downward in the tree
- x traces the path ; y retains the pointer to the parent of x
-directional choices are determined by the compare :
key [x] vs key [z]
until x is set to nil
- nil occupies the location where z is to be stored
- Running time : as with others, this is O ( h )
Deletion
This operation is a bit more complicated – depends basically
on whether the node to be deleted , z , has:
- No children
In this case, remove the node by changing its parent, p[z],
by replacing z with NIL as its child
- A single child
Remove the child and create a “ spliced link” from the
parent, p [ z ] to the child of z
- Two children
A bit more complicated – find the successor y that has no
left child and replace the contents of z with the contents
of y. In this case it’s successor is the minimum in its right
subtree, and so, that successor has no left children
Case 2 : node has two children :
Find the successor ( or predecessor ) of the node to be
removed replace the node with the value ( key) of the
successor ( or predecessor )move to earlier cases to
resolve any created orphans.
Tree_Delete ( T , z )
Theorem : The dynamic set operations Insert
and Delete can run in O( h ) time, in a binary
search tree of height h.
Note : h not n
Sorting :
Sort ( A ) for i 1 to n
do Tree_Insert ( A [ i ] )
InOrder_Traversal (root)
What should you expect for a lower bound on
the run time ?
## Ω ( n lg n ) ###
Why ? - Is this a comparison based sort ?
Average Case Analysis
( same as Quicksort )
The algorithm is a quicksort in which the partitioning
process maintains the order of the elements in each
partition.
Consider : given : 3 1 8 2 6 7 5
In turn everything is compared to 3 then to 1 or 8, etc.
-order is different than quicksort, concept same :
namely, at each level n compares, depth ~ lg n
Î Ω ( n lg n ) running time.
For a priority queue – to extract the minimum :
Extract_Min ( x) - returns a pointer to the Min key
while left(x) = null
do x left [ x ]
return x
*** examine the first tree ( F B …) and see what happens.
Deletion
Lazy Deletion
Instead of physically deleting nodes,
just mark them as deleted
10
5
15
2
9
20
+
+
+
+
–
–
Simpler
some adds just flip deleted flag
physical deletions done in batches
extra memory for deleted flag
many lazy deletions slow finds
some operations may have to be
modified (e.g., min and max)
5
15
2
17 30
7
10
9
7
Why might deletion be harder than insertion?
Lazy Deletion
Deletion - Leaf Case
Delete(17)
10
10
Delete(17)
Delete(15)
5
15
5
15
Delete(5)
Find(9)
Find(16)
Insert(5)
Find(17)
2
9
7
20
17 30
2
9
7
20
17 30
20
17 30
Deletion - One Child Case
Deletion - Two Child Case
10
Delete(15)
5
10
Delete(5)
15
2
9
5
20
7
20
2
30
9
30
7
replace node with value guaranteed to be between the
left and right subtrees: the successor
Could we have used the predecessor instead?
Delete Code
Deletion - Two Child Case
10
Delete(5)
5
20
2
9
30
7
always easy to delete the successor – always has either
0 or 1 children!
void delete(Comparable x, Node *& p) {
Node * q;
if (p != NULL) {
if (p->key < x) delete(x, p->right);
else if (p->key > x) delete(x, p>left);
else { /* p->key == x */
if (p->left == NULL) p = p->right;
else if (p->right == NULL) p = p>left;
else {
q = successor(p);
p->key = q->key;
delete(q->key, p->right);
}
} } }
Beauty is Only Θ(log n) Deep
Binary Search Trees are fast if they’re shallow:
– e.g.: perfectly complete
– e.g.: perfectly complete except the “fringe” (leafs)
– any other good cases?
Balance
Balance measure :
height(left subtree) - height(right subtree)
¾ zero everywhere ⇒ perfectly balanced
¾ small everywhere ⇒ balanced enough
What matters here?
t
5
Problems occur when one
branch is much longer
than the other!
Balance between -1 and 1 everywhere ⇒
maximum height of 1.44 log n
7