Download Non-Linear Data Structures - Trees

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
COMP200 - Data Structures
Non-Linear Data Structures - Trees
Rob Dempster
[email protected]
School of Computer Science
University of KwaZulu-Natal
Pietermaritzburg Campus
COMP200 - Data Structures – p. 1
Abstract
This is not a paper. It is a segment of the lecture presentation slides I used to provide a
framework for the Data Structures lectures I presented while my (ex) boss spent a year (on
Sabbatical) gadding about the northern hemisphere. The slides were prepared using SuSE
Linux, Emacs, LATEXand Prosper.
c
2008,
Robert Dempster. These are free slides. This work is licensed under a Creative
Commons Attribution-ShareAlike 2.5 License. (This license allows you to redistribute these
slides and handouts in unmodified form. It allows you to make and distribute modified
versions, as long as you include an attribution to the original author, clearly describe the
modifications that you have made, and distribute the modified work under the same license
as the original. See the http://creativecommons.org/licenses/by-sa/2.5/ for full
details.)
The most recent version of these slides and handouts are always available, at no charge, for
downloading and for on-line use at the Web address
http://java-sun.cs.ukzn.ac.za/~robd/dsslides/ . There you will find the
LATEXsource code together with the slides in formats suitable for slide presentations (.a4.sp.)
and hand-outs (.a4.ho.).
COMP200 - Data Structures – p. 2
Introduction
In the previous section we studied the properties of linear (sequential) data
types that could be used to efficiently represent and manipulate linear
arrangements of data.
In this section we will study the properties of a subset of the nonlinear data
types that could be used to efficiently represent and manipulate nonlinear
arrangements of data.
The nonlinear data we will be concerned with here, are those data that are
either explicitly arranged in a hierarchical manner, or result in a hierarchical
structure when processed.
Two classical examples are:
Family trees, and
Possible routes from a starting point to a particular destination:
shortest path, spanning trees and minimum spanning trees, and
(business) decision tables or game trees.
COMP200 - Data Structures – p. 3
Definitions
A general tree (ADT) can be defined recursively as follows:
My definition:
1. A tree is either empty ( NULL), i.e., contains no elements, or
2. It consists of a root node that has zero or more (sub)trees associated with it.
From [DROZ1996]:
1. An empty structure (i.e., one that contains no elements/nodes) is an empty tree.
2. If t1 , t2 , ..., tk are disjoint (independent/unconnected) trees, then the structure whose
root has as its children the roots of t1 , t2 , ..., tk , is also a tree.
3. Only structures generated by rules 1 and 2 are trees.
From [HORO1995] - A tree is a finite set of one or more nodes such that
1. There is a specially designated node called the root.
2. The remaining nodes are partitioned into n ≥ 0 disjoint sets T1 , T2 , ..., Tn , where
each of these sets is a tree. T1 , T2 , ..., Tn are called the subtrees of the root.
COMP200 - Data Structures – p. 4
Tree Examples
COMP200 - Data Structures – p. 5
Some Terminology
Some Terminology:
(i) is an empty tree i.e., it contains no nodes.
(ii) is a tree containing a single node, that is also the root node.
(iii) is a tree consisting of a root node and three subtrees.
in (iii) the nodes b, c and d are children of the parent a.
nodes which have no children are generally referred to as the leaves of the tree. In
case you have not noticed, our trees are upside down.
the ancestors of a node are all the nodes along the unique path from the root to that
node.
in (v) the nodes b, c and d are siblings, i.e., brothers/sisters. Note that e and f are not
siblings of b, c and d.
the level of a node is equal to the number of arcs/links/edges traversed from the root
to the node plus 1.
the height (or depth) of a tree is equal to the maximum level of any node in the tree.
For (v) it is 5.
COMP200 - Data Structures – p. 6
More Tree Terminology
The number of subtrees of a node is called its degree.
The degree of a tree is the maximum of the degree of the
nodes in the tree.
An n-ary tree can be defined as follows:
An n-ary tree is either empty (NULL), i.e., contains no
elements, or
It consists of a root node that has at most n (sub)trees
attached to it.
COMP200 - Data Structures – p. 7
More Tree Terminology (cont’d.)
Consider the following examples for n = 2 (better known as Binary
Trees):
If all the leaves in a tree are either at a height h or h + 1, then the
tree is balanced. Both (i) and (ii) are balanced.
Allowing a tree to grow in an unbalanced manner may result in
inefficiencies in terms of some of the algorithms that use the tree.
If all the leaves in a tree are at a height h then the tree is complete.
Only (i) is complete.
For all nonempty, binary trees, whose non-terminal nodes have
exactly two nonempty children, the number of leaves m is greater
than the number of non-terminal nodes k and m = k + 1.
COMP200 - Data Structures – p. 8
Balanced Binary Trees
COMP200 - Data Structures – p. 9
Tree Traversals and Binary Search Trees
If the nodes in a tree have been placed such that a particular traversal of the tree reveals
a specific order in some way, then it is an ordered tree. Both these trees exhibit order:
An in-order traversal of (i), returns the list of letters sorted alphabetically.
For (ii), (taking note of how unary operators are taken care of):
pre-order returns the pre-fix expression
in-order returns the in-fix expression
post-order returns the post-fix expression
Ordered trees are often maintained as such to improve the efficiency of searching
algorithms.
Binary trees are used in this context so often, that we refer to Binary Search Trees
(BST), Decision Trees and Game Trees:
Average case search - O(ln n) for BSTs versus O(n/2) for ordered typical linear data
structures.
Worst case - both O(n).
COMP200 - Data Structures – p. 10
Tree Traversals - Breadth First
The traversals we have examined here are also depth first traversals.
The traversal algorithm always descends to the greatest possible depth within
the tree, before back tracking and trying an alternate path.
This means that a search based on these algorithms, may traverse several
alternate paths, before it finds a solution that exists at a depth, significantly
less than the height of the tree.
This suggests that an alternative approach that traverses the tree breadth first,
rather than depth first may be more appropriate.
While a discussion of tree ADTs should not concern itself with implementation
details, we have already done so in terms of the figures we have used to
represent trees.
We have also done so in terms of some of the terminology we have
introduced.
COMP200 - Data Structures – p. 11
Tree Traversals - Breadth First (cont’d.)
The following representation/implementation is presented as an
alternative for one of the trees (v) presented earlier.
This representation certainly supports breadth first traversals
more directly.
It also certainly makes it possible to proceed from one sibling to
another directly.
Traversals can also be supported by using the arcs linking the
nodes in a different manner and/or by using additional arcs.
We can also avoid the use of extra arcs if we use small tags,
possibly dynamically, to determine how arcs are to be
interpreted.
COMP200 - Data Structures – p. 12
Breadth First Searching Tree Example
COMP200 - Data Structures – p. 13
Threaded Tree Traversals
The BST in the following diagram employs tags to enhance its interpretation of
arcs such that they can be followed directly to perform one of the binary tree
traversals.
In this implementation, if a tag implies that an arc does not point to a node in
the conventional sense, then it serves as a thread (pointer) which either points
to node’s predecessor, or successor.
The threads used in this form of (threaded) tree can thus be used to traverse
the tree without having to employ a stack either directly, or implicitly, via
recursion.
The threads are in essence a form of implicit hardwired stack.)
Finally it is possible to traverse a tree without an implicit runtime stack, or
resorting to explicit threads.
This can be done by creating the stack (or threads) dynamically during the
traversal of the tree.
COMP200 - Data Structures – p. 14
Threaded Tree Example
COMP200 - Data Structures – p. 15
Morris’stackless in-order Tree Traversal Example
Morris’ algorithm (shown in the next slide) achieves a stackless BST traversal
by making changes to the tree during the traversal.
The changes involve links that are no longer required during the current
traversal.
As no attempt is made to restore the altered links, the tree is in effect
destroyed.
The tree could subsequently be restored by creating and retaining some
additional information during the traversal.
This would of course be contrary to the purpose of Morris’ algorithm namely,
to eliminate the run-time stack.
COMP200 - Data Structures – p. 16
Morris’ Algorithm
BT p = this;
BT tmp;
while (p != null)
if (p.left == null) {
visit(p);
// process the node data
p = p.right;
} else {
tmp = p.left;
while (tmp.right != null \&\& // go to the rightmost node of
tmp.right != p)
// the left subtree or
tmp = tmp.right;
// to the temporary parent of p;
if (tmp.right == null) {
// if ’true’ rightmost node was
tmp.right = p;
// reached, make it a temporary
p = p.left;
// parent of the current root,
} else {
// else a temporary parent has been
visit(p);
// found; visit node p and then cut
tmp.right = null;
// the right pointer of the current
p = p.right;
// parent, whereby it ceases to be
}
// a parent;
}
COMP200 - Data Structures – p. 17
Node Deletion
Deleting a node from any tree is generally non-trivial and presents three
cases:
The node is a leaf node - simply replace the reference to the node with a
null reference.
The node only has one child - replace the reference to the node with a
reference to its single child node.
The node has two children - several solutions exist and we will examine
one (Delete by Merging) in more detail in the next slide.
The Delete by Merging solution:
Constructs a new ordered tree out of the two subtrees by merging them.
It then attaches this tree to the parent of the node to be deleted in its
place.
The height of the tree may be extended during the deletion and the tree
may also become quite unbalanced.
COMP200 - Data Structures – p. 18
Delete (a node) by Merge - Pseudo Code
// assume p is the reference in the tree
// to the node to be deleted
if (p references a leaf node)
set p to null
else if (p.right is null)
set p to p.left
else if (p.left is null)
set p to p.right
else
tp = p.left
while (tp.right != null)
tp = tp.right
tp.right = p.right
p = p.left
COMP200 - Data Structures – p. 19
Delete (a node) by Merge - A Java Problem
Consider a method deletNode(BST bt, T data) where:
bt is a reference to the root node of the BST (or null if the BST is
empty), and
data is the value associated with the node in the BST to be
deleted.
If the node to be deleted is the root node of the BST, then we have a
problem.
The problem is that Java passes parameters by value.
Any change made to the root reference by this method e.g., such that
it references the new root node of the merged subtrees, does not
affect the reference to the root node of the BST in the calling
environment.
COMP200 - Data Structures – p. 20
Delete (a node) by Merge - A Java Problem (cont’d.)
This problem could be solved if the reference to the root field
instance could be passed as a reference parameter.
That would allow the method to change the reference to root node
such that it reference to the root node of the BST in the calling
environment.
both C++ and C# both support reference parameters.
The problem could also be solved by modifying the BST
representation and delete algorithm as shown in the following slides.
The algorithm requires an additional parameter that references the
Java field instance (probably root) that in turn references the BST
node at the root of the BST, to be passed to the method.
COMP200 - Data Structures – p. 21
The new BST and Empty BST Representation
The BSTX
EMPTY
Tree
data
root
left
right
EMPTY
’’’’
% %
COMP200 - Data Structures – p. 22
Delete by Merge - Java Code - Part 1
public void deleteByMerging(T data) {
BT p = pp = tmp = null; // pointer to root node an pointer to pointer ...
boolean left =true;
if (isEmpty()) {
return;
} else {
p = root.getRight();
pp = root.getRight();
while (!p.isEmpty())
if (p.compareTo(data) == 0)
break;
else if (p.compareTo(data) < 0) {
left = false;
pp = p; p = p.getRight();
} else {
left = true;
pp = p; p = p.getLeft();
}
}
}
COMP200 - Data Structures – p. 23
Delete by Merge - Java Code - Part 2
System.out.println("*** " + p.getData());
if (!p.isEmpty()) {
if (p.getLeft().isEmpty() && p.getRight().isEmpty()) {
if (left)
pp.setLeft(BT.EMPTY);
else
pp.setRight(BT.EMPTY);
} else if (p.getRight().isEmpty())
if (left)
pp.setLeft(p.getLeft());
else
pp.setRight(p.getLeft());
else if (p.getLeft().isEmpty())
if (left)
pp.setLeft(p.getRight());
else
pp.setRight(p.getRight());
else { //start here with following slide
COMP200 - Data Structures – p. 24
Delete by Merge - Java Code - Part 3
else { //start here from previous slide
tmp = p.getLeft();
while (!tmp.getRight().isEmpty())
tmp = tmp.getRight();
tmp.setRight(p.getRight());
if (left)
pp.setLeft(p.getLeft());
else
pp.setRight(p.getLeft());
}
}
COMP200 - Data Structures – p. 25
Node Deletion by Copying
This solution effectively deletes a non leaf node by
replacing it with the data of its immediate predecessor.
The immediate predecessor will be a leaf node and is
easily deleted.
This algorithm does not increase the height of the tree.
It is however asymmetric, as it always deletes the node of
the immediate predecessor of the node which is in effect
being deleted.
The development of this algorithm and its implementation
is left as an exercise.
COMP200 - Data Structures – p. 26
Balancing Trees
In terms of the general discussion thus far it has been pointed out that:
Trees model the hierarchical structure of the data of certain problem domains very
well, and
Searching a tree for an item can be done faster than for a list.
The second claim is only true if we are able to ensure that the tree being searched
remains reasonably balanced.
For a balanced binary search tree containing n nodes the number of comparisons will
not exceed log2 n.
There are a number of techniques that can be employed to obtain balanced trees. They
include:
Ordering the data prior to inserting it into the tree, such that it produces a balanced
tree.
This suits applications which only ever search an established tree.
Rearrange the nodes in the tree after a new node has been inserted, to ensure that it
remains balanced.
This suits applications that add/delete nodes dynamically as the program runs.
COMP200 - Data Structures – p. 27
Balancing Trees
The following neat routine is based on the first option.
It requires a previously sorted array containing the data.
It then employs the following recursive routine to remove the element
in the middle of the sorted array and insert it into the tree.
balance(data[], int first, int last)
{
if (first <= last) {
int middle = first + (last-first) / 2;
insert(data[middle]);
balance(data, first, middle-1);
balance(data, middle+1, last);
}
}
COMP200 - Data Structures – p. 28
Splay Trees (leans heavily on Bailey[2])
Because the process of adding a new value to a binary search tree is deterministic:
it produces the same result tree each time, and
because inspection of the tree does not modify its structure, one is stuck with the
performance of any degenerate tree constructed.
What might work better would be to allow the tree to reconfigure itself when operations
appear to be inefficient.
The splay tree quickly overcomes poor performance by rearranging the tree’s nodes on
the fly using a simple operation called a splay.
Instead of performing careful analysis and optimally modifying the structure whenever a
node is added or removed, the splay tree simply moves the referenced node to the top of
the tree.
The operation has the interesting characteristic that the average depth of the ancestors
of the node to be splayed is approximately halved.
As with skew heaps, the performance of a splay trees operators, when amortised over
many operations, is logarithmic.
COMP200 - Data Structures – p. 29
Splay Tree Rotations
The basis for the splay operation is a pair of operations called rotations as
shown in the following slide.
Each of these rotations replaces the root of a subtree with one of its children.
A right rotation takes a left child, x, of a node y and reverses their relationship.
This induces certain obvious changes in connectivity of subtrees, but in all
other ways, the tree remains the same.
In particular, there is no structural effect on the tree above the original location
of node y.
A left rotation is precisely the opposite of a right rotation; these operations are
inverses of each other.
For each rotation accomplished, the non-root node moves upward by one
level.
COMP200 - Data Structures – p. 30
Splay Tree Rotations (cont’d.)
The code for rotating a binary tree about a node is a method of the Binary-Tree class.
If x is the root, we are done.
If x is a left (or right) child of the root, rotate the tree to the right (or left) about the root. x
becomes the root and we are done.
If x is the left child of its parent p, which is, in turn, the left child of its grandparent g,
rotate right about g, followed by a right rotation about p.
A symmetric pair of rotations is possible if x is a left child of a left child. After double
rotation, continue splay of tree at x with this new tree.
If x is the right child of p, which is the left child of g, we rotate left about p, then right
about g.
The method is similar if x is the left child of a right child. Again, continue the splay at x in
the new tree.
After the splay has been completed, the node x is located at the root of the tree.
If node x were to be immediately accessed again (a strong possibility), the tree is clearly
optimised to handle this situation
COMP200 - Data Structures – p. 31
Splay Tree Rotations (cont’d.)
After splaying it is not the case that the tree becomes more balanced.
Clearly, if the tree is splayed at an extremal value, the tree is likely to be
extremely unbalanced.
An interesting feature, however, is that the depth of the nodes on the original
path from x to the root of the tree is, on average, halved.
Since the average depth of these nodes is halved, they clearly occupy
locations closer to the top of the tree where they may be more efficiently
accessed.
To guarantee that the splay has an effect on all operations, we simply perform
each of the binary search tree operations as before, but we splay the tree at
the node accessed or modified during the operation.
In the case of remove, we splay the tree at the parent of the value removed.
COMP200 - Data Structures – p. 32
References
References
[1] How to Think Like a Computer Scientist, Downey, Allen B., URL:
http://www.greenteapress.com/thinkapjava/
[2] Java Structures : Data Structures in Java for the Principled Programmer, Bailey, Duane
A., URL: http://www.cs.williams.edu/JavaStructures/Welcome.html
COMP200 - Data Structures – p. 33