Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
What are search trees?
Search Trees
Allow efficient searching of ordered data
Implement Ordered Dictionary ADT
Provide flexible mechanism for storing and
retrieving data
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Binary Search Tree
Each node stores a keykey-element pair
key(leftsubtree)
key(leftsubtree) <= key(node)
key(node) <= key(rightsubtree)
key(rightsubtree)
Trees in Goodrich/Tamassia
Goodrich/Tamassia store elements only at
internal nodes
• I generally store elements at all nodes
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Analysis of TreeSearch
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Tree Search
TreeSearch(k,
TreeSearch(k, node) {
if ((k < key(node)) &&
(leftChild(node)
leftChild(node) != null))
return TreeSearch(k,
TreeSearch(k, leftchild(node));
leftchild(node));
else if (k > key(node)) &&
(rightchild(node)
rightchild(node) != null))
return TreeSearch(k,
TreeSearch(k, rightchild(node));
rightchild(node));
else
return node; }
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Inserting into Binary Tree
Call TreeSearch on root to find appropriate
parent node
At each node, perform O(1) work
Maximum nodes visited is h, the height of
tree
Total running time is thus O(h)
• Call again using a child if key already exists
Parent node will be external
Insert element as new child of parent
Also takes O(h)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
1
Removing from Binary Tree
AVL Trees
Call TreeSearch to locate node for removal
If node is external, just remove node
If node has one child, replace node with child
Binary search trees which maintain O(logn
(logn)
height
If node has two children
Maintain height balance property
• Heights of children differ by at most 1
• Find smallest element greater than node (will
have 0 or 1 children)
• Local property to maintain, but guarantees
global property of overall height
• Replace element with that node
Again, O(h)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Analyzing AVL height
Inserting with balanced height
n(h) : minimum nodes for AVL tree of height h
Base conditions
n(0) = 1; n(1) = 2
Insert node into binary search tree as usual
Recurrance relation
n(h) = 1 + n(hn(h-1) + n(hn(h-2) > 2*n(h2*n(h-2)
n(h) > 2*n(h2*n(h-2) > 4*n(h4*n(h-4) > 8*n(h8*n(h-6), etc.
n(h) > 2i*n(h*n(h-2i)
• Increases height of some nodes along path to
root
Walk up towards root
Set i to achieve base condition
h-2i=1 → i=(hi=(h-1)/2
(h-1)/2*n(1) = 2(h(h-1)/2*2 = 2(h(h-1/2)+1
n(h) > 2(h-
• If unbalanced height is found, restructure
unbalanced region with rotation operation
Bounding h
• logn(h)
logn(h) > (h(h-1)/2 + 1
• h < 2(logn(h) - 1) + 1 = 2logn(h) -1 = O(logn
(logn)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Balanced Tree
3
2
0
15
30
Insert (case 1)
4
50
2
75
1
0
35
40
1
0
45
3
0
60
65
0
70
0
80
0
15
Unbalanced
node
30
50
2
75
2
0
35
40
1
1
45
0
0
60
65
0
0
80
70
43
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
2
“Rotate Left” . . .
4
3
30
“Rotate Left” . . .
4
50
2
75
2
0
40
15
0
35
1
1
65
0
45
3
80
0
75
40
45
0
35
0
0
1
1
0
15
70
60
2
2
30
0
50
65
0
70
60
0
80
0
43
43
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
“Rotate Left” - Balanced!
3
2
40
1
0
2
2
75
1
45
0
0
15
4
50
1
30
Insert (case 2)
65
0
35 43
0
0
80
0
30
15
3
75
1
0
40
0
2
0
65
1
45
35
70
60
50
80
0
70
60
Unbalanced
node
0
53
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
“Rotate Right”. . .
4
2
0
15
30
4
50
0
35
2
3
75
1
40
“Rotate Right”. . .
2
0
1
45
60
0
65
0
70
0
80
0
15
30
50
3
0
35
40
75
2
1
1
0
45
60
65
0
0
0
70
80
53
53
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
3
“Rotate Right” - Balanced!
3
2
0
30
4
50
0
65
1
40
30
0
75
70
0
45
1
0
60
0
35
3
2
1
15
Insert (case 3)
0
80
53
15
50
2
75
2
40
1
Unbalanced
node
0
65
0
45
35
0
1
0
0
80
70
60
33
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
“Double Rotation RightRight-Left” - Right . . .
4
3
0
30
4
50
75
1
40
1
0
35
0
3
2
2
15
“Double Rotation RightRight-Left” - Right . . .
65
0
45 60
0
0
80
50
0
15
70
0
33
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
45
0
15
30
2
35
0
33
1
1
40
0
0
3
2
75
60
65
0
70
45
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
0
80
70
60
“Double Rotation RightRight-Left” - Left . . .
4
50
65
0
0
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
“Double Rotation RightRight-Left” - Right . . .done!
4
1
40
35
0
75
2
1
33
3
2
30
0
80
0
15
30
0
33
50
2
75
2
35
1
1
40
0
0
60
65
0
0
80
70
45
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
4
“Double Rotation RightRight-Left” - Left . . .
4
0
15
3
2
30
35
0
33
3
50
2
2
75
1
1
40
“Double Rotation RightRight-Left” - Balanced!
0
0
45
65
60
0
80
70
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Restructure Procedure
Consider the first unbalanced node
encountered (walking upward) and its two
descendants along that path
Sort them in increasing order and label as a,
b, and c
Place b as the parent of a and c where the
unbalanced node was
Hook up the (up to) 4 subtrees as the
appropriate children of a and c
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Remove Algorithm
Perform removal as with binary search tree
• May decrease height of some nodes on path to
the root
Walk upwards to the root
• If unbalanced height is found, restructure
unbalanced region with rotation operation
Remove is also O(logn
(logn)
• But multiple restructure operations may be
necessary along the way
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
1
0
0
15
30
35
0
33
50
2
75
1
40
1
0
45
0
60
65
0
0
80
70
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Analyzing Insert
Upward traversal with height recomputation
takes O(h) = O(logn
(logn)
Restructure takes O(1)
The restructure always reduces the height of
the unbalanced node
• So only one restructure is necessary
Total time: O(logn
(logn) + O(1) = O(logn
(logn)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
MultiMulti-way Search Trees
Each node may store multiple keykey-element
pairs
Node with d children (d
(d-node) stores d-1 keykeyelement pairs
Children have keys that fall either before
smallest parent key, after largest parent key,
or between two parent keys
(for this section, let’s use convention of external
nodes storing no element, as in book)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
5
Example MultiMulti-way Search Tree
MultiMulti-way Tree Searching
50
10 15
25
Basically same as for binary tree
60 70 80
20 30
40 42 45
55
64 66
1. Start at root
75
85 90
2. Find appropriate child path to go down
3. Traverse to child
22
27
4. Repeat 11-3 until found or reach external
External node between each pair of keys and before/after
(n(n-1) + 1 + 1 = n+1 external nodes
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
MultiMulti-way Search Analysis
Number of nodes traversed is up to h
Work at each node is function of d
• O(logd
O(logd) if structure storing keys provides
efficient search, otherwise O(d)
Total worst case time
• O(hlogd
logdmax) or O(hdmax)
• If dmax is bounded by small constant, just O(h)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
(2,4) Tree Height Analysis
Lower bound on h
n(h) <= 4h : max children per node is 4
# external nodes = n+1
n+1 <= 4h ⇒ h >= log(n+1) / 2
Upper bound on h
At least 2 nodes at depth 1, 4 at depth 2, etc.
At least 2d nodes at depth d
At least 2h external nodes
2h <= n+1 ⇒ h <= log(n+1)
h = Θ(logn
logn) ⇒ search is O(logn
(logn)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
(2,4) Trees
Efficient multimulti-way search trees
• O(logn
(logn) height
Maintain two properties:
1. Size property: nodes have at most 4 children
2. Depth property: All external nodes have same
depth (i.e. all at the same level)
level)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Inserting into (2,4) Tree
1. Search for position in deepest internal
node
2. Insert into position
3. If # elements > 3, do a split operation
• Split node into 2 nodes
• Push 1 element up to parent
—Create new root if no parent
—If parent overflows, split parent
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
6
Simple Insertion (no overflow)
10
Insertion with Overflow
10
5
12 14
5
10
12 14 15
5
10
Insert 11
12 14 15
5
11 12 14 15
Split
Insert 15
10 14
11 12
5
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Insert with Cascading Split
6 8 10
5
7
9
Insert 11
7
1. Search for element
2. Remove element
3. If element’s child is internal
• Swap next larger element into hole (so we’ve
removed element above an external)
11 12 14 15
9
10
6 8
Removing from (2,4) Tree
6 8 10
5
12 14 15
4. If node has no elements
Split
Split
14
15
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
If an adjacent sibling has > 1 element
Perform transfer (kind of rotation)
6 8 10 14
Else
5
7
11 12
9
15
5
7
11 12
9
Perform fusion (can cascade upward)
15
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Simple Removal
6 8 10
5
7
9
Removal with Swap
Remove 14
12 14 15
5
6 8 10
7
9
6 8 10
12 15
5
7
9
6 8
Remove 10
12 14 15
5
7
9
12 14 15
Swap
6 8 12
5
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
7
9
14 15
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
7
Removal with Transfer
Remove 9
6 8 10
5
7
9
Removal with Fusion
12 14 15
6 8 10
5
7
6 8 10
12 14 15
5
7
9
6 8 10
Remove 7
5
12 14 15
9
Fusion
Transfer
(~rotate)
6 10
6 8 12
5
7
10
12 14 15
5
14 15
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Performance
8 9
12 14 15
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
External Memory Searching
Memory Hierarchy
Insertion
Registers
• Find is O(logn
(logn)
• Each split is O(1); only O(logn)
(logn) splits necessary
Remove
Cache
RAM
• Each transfer/fusion is O(1); only O(logn)
(logn)
necessary
External Memory
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Types of External Memory
Hard disk
Floppy disk
Compact disc
Tape
Distributed/networked memory
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Primary Motivation
External memory access much slower than
internal memory access
• orders of magnitude slower
• need to minimize I/O Complexity
• can afford slightly more work on data in
memory in exchange for lower I/O complexity
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
8
Application Areas
Searching
Sorting
Data Processing
Data Mining
Disk Blocks
Data is read one block at a time
• pack as much into a block as possible
• minimize number of block reads necessary
Data Exploration
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
I/O Efficient Dictionaries
Balanced tree structures
• Typically O(log2n) transfers for query or
update
• Want to reduce height by constant factor as
much as possible
• Can be reduced to O(logBn) = O(log2n/log2B)
—B is number of nodes per block
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
(a,b) Trees
Generalization of (2,4) trees
Size property: internal node has at least a
children and at most b children
• 2 <= a <= (b+1)/2
Depth property: all external nodes have same
depth
Height of (a,b) tree is Ω (logn/logb)
logn/logb) and
O(logn/loga)
O(logn/loga)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
B-Trees
Choose a and b to be Θ(B)
Height is now O(logBn)
I/O complexity for search is O(logBn)
Johns Hopkins Department of Computer Science
Course 600.226: Data Structures, Professor: Jonathan Cohen
9