Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Structures and
Algorithms for Information
Processing
Lecture 6:
Heaps, B-Trees, and B+Trees
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
1
Homework Policy
• Late homework will normally be
penalized 10% per day late;
• Each student may turn in one late
homework with no penalty (up to
one week late)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
2
Grading
• Homeworks(4-5)
• Midterm Exam
• Final Exam
90-723: Data Structures
and Algorithms for
Information Processing
50%
25%
25%
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
3
Today’s Topics
• Ways to Balance Trees
– Heaps & Priority Queues
– B-Trees
• Time Analysis of Trees
– Binary trees
– Heaps
– B-Trees
• See Chapter 10 in Main
• B+ trees
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
4
Binary Trees: Worst Case
1
Inserting nodes that are already
sorted leads to worst-case
behavior: d = (n - 1) = 5
2
How can we use the idea of
balanced trees to avoid this
kind of situation?
3
4
5
6
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
5
Balanced Trees
4321
2 221
453 6
1 1 3 3 5
Trees are “no deeper than they
have to be”
Complete binary trees minimize
depth by forcing each row to be
full before d is increased
90-723: Data Structures
and Algorithms for
Information Processing
7
Heaps are complete binary trees
which limit the depth to a minimum
for any given n nodes, independently
of the order of insertion. Heaps are not
search trees.
Main’s slides on Heaps
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
6
B-Trees
• B-Trees are a type of search tree
• Further reduction in depth for a
given tree of n nodes
• Two adjustments:
– nodes have more than two children
– nodes hold more than a single
element
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
7
B-Trees
• Can be implemented as a set (no
duplicate elements) or as a bag
(duplicate elements allowed)
• This example focuses on the set
implementation
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
8
B-Trees
• Every B-Tree depends on a
positive constant, MINIMUM, which
determines how many elements
are held in a single node
• Rule 1: The root may have as few
as 0 or 1 elements; all other nodes
have at least MINIMUM elements
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
9
B-Trees
• Rule 2: The maximum number of
elements in a node is twice the
value of MINIMUM
• Rule 3: Elements in a node are
stored in a partially-filled array,
sorted from smallest (element 0)
to largest (final position used)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
10
B-Trees
• Rule 4: The number of subtrees
below a non-leaf node is always
one more than the number of
elements in the node
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
11
B-Trees
• Rule 5: For any non-leaf node:
– The element at index I is greater than
all the elements in subtree number I
of the node
– An element at index I is less than all
the elements in subtree (I + 1) of the
node
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
12
B-Trees
93 and 107
Each element in
subtree 2 is greater
than 107.
Each element in
subtree 0 is less
than 93.
Subtree
Number 0
Subtree
Number 1
Subtree
Number 2
Each element in
subtree 1 is between
93 & 107.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
13
B-Trees
• Rule 6: Every leaf in a B-Tree has
the same depth
• The implication is that B-Trees are
always balanced.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
14
B-Tree Example
6
2 and 4
1
3
9
5
7 and 8
NOTE: Every child of the root node
is also a B-Tree!
90-723: Data Structures
and Algorithms for
Information Processing
10
MINIMUM = 1
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
15
Set ADT with B-Trees
public class IntBalancedSet {
// constants
private static final MINIMUM = 200;
private static final MAXIMUM = 2 * MINIMUM;
// info about root node
int dataCount;
int[] data = new int[MAXIMUM + 1];
int childCount;
// info about children
IntBalancedSet[] subset =
new IntBalancedSet [MAXIMUM+2];
…}
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
16
MINIMUM = 1
6
MAXIMUM = 2
2 and 4
1
9
3
5
7 and 8
dataCount
1
data
childCount
2
subset
10
6
?
?
null null
[References to IntBalancedSet instances]
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
17
Invariant for Set B-Tree
• The elements of the set are stored
in a B-Tree, satisfying the 6 rules
• The number of elements in the
root is stored in the instance
variable dataCount, and the
number of subtrees is stored in the
instance variable childCount.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
18
Invariant for Set B-Tree
• The root’s elements are stored in
data[0] through
data[dataCount - 1] .
• If the root has subtrees, then
subset[0] through
subset[childCount - 1] are
references to those subtrees.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
19
Searching a B-Tree
• Sets use the method contains to
find if an element is in the set:
– Set I equal to the first index I where
data[I]>=target;
otherwise I = dataCount
– If data[I] == target, return true;
else if (no children) return false;
else
return subset[I].contains(target);
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
20
Sample Search
contains(7);
6
7 > 6, so
I = dataCount = 1
2 and 4
1
3
9
5
7 and 8
Subset[1].contains(7);
10
9>=7, so
I = 0; data[I] != 7
Subset[0].contains(7);
7>=7, so
I = 0; data[I] = 7!
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
21
Add/Remove from B-Tree
• Complex two-pass operations
• pp. 500-512
• Covered on next slide set for 2-3
trees
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
22
Trees, Logs, Time Analysis
• Heaps and B-Trees are efficient
because d is kept small
• How can we relate the depth of a
tree and the worst-case time
required to search, add, and
remove an element?
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
23
Trees, Logs, Time Analysis
• The worst case time performance
for the following operations are all
O(d):
– Adding an element to a binary search
tree, heap, or B-Tree
– Removing an element from a binary
search tree, heap or B-Tree
– Searching for a specified element in a
binary search tree or B-Tree
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
24
Trees, Logs, Time Analysis
• How can we relate the depth d to
the number of elements n?
• Example: binary trees
– d is no more than n - 1
– O(d) is therefore O(n - 1) = O(n)
(remember, we can ignore constants)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
25
Time Analysis for Heaps
• Heaps
– Level
0
1
2
3
…
d
90-723: Data Structures
and Algorithms for
Information Processing
Nodes to Fill
1
2
4
8
…
2^d
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
26
Time Analysis for Heaps
• Minimum nodes to reach depth d
in a heap:
(1 2  4  ...  2
d 1
) 1  2
d
• The number of nodes in a heap is
d
at least 2
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
27
Review Base-2 Logarithms
• For any positive number x, the
base 2 logarithm of x is an
exponent r such that:
2
r
90-723: Data Structures
and Algorithms for
Information Processing
 x
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
28
Review Base-2 Logarithms
20  1
log 2 1  0
21  2
log 2 2  1
22  4
log 2 4  2
...
2d  2d
log 2 2 d  d
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
29
Worst-Case For Heaps
• In a heap the number of elements
n is at least 2^d
log 2 n  log 2 2
d
log 2 2  d
d
log 2 n  d
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
30
Worse-Case For Heaps
• Adding or removing an element in
a heap with n elements is O(d)
where d is the depth of the tree.
Because d is no more than log2(n),
the operations are O(log2(n)),
which is O(log(n)).
• (see discussion p. 516-520)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
31
Many Databases use B+
Trees
90-723: Data Structures
and Algorithms for
Information Processing
From Wikipedia
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
32
Related documents