Download Tree - National Cheng Kung University

Document related concepts

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Bioinformatics Programming
EE, NCKU
Tien-Hao Chang (Darby Chang)
1
Tree
2
A Tree Structure

A tree structure means that the data are
organized so that items of information are related
by branches
3
Definition of a Tree Structure


(recursive definition)
A tree is a finite set of one or more nodes
such that
– there is a specially designated node called root
– the remaining nodes are partitioned into n ≧ 0
disjoint set T1, …, Tn, where each of these sets
is a tree
– T1, …, Tn are called the sub-trees of the root

Every node in the tree is the root of some
subtree
4
Tree
Some Terminology











node: the item of information plus the branches to each node
degree: the number of sub-trees of a node
degree of a tree: the maximum of the degree of the nodes in the
tree
terminal nodes (or leaf): nodes that have degree zero
non-terminal nodes: nodes that don’t belong to terminal nodes
children: the roots of the sub-trees of a node X are X’s children
parent: X is the parent of its children
siblings: children of the same parent are said to be siblings
ancestors: all the nodes along the path from the root to that node
level (of a node): defined by letting the root be at level one (if a
node is at level l, then its children are at level l+1)
height (or depth): the maximum level of any node in the tree
5
Level
A

An example
–
–
–
–
–
–
–
–
–
–
–
–
B
1
C
A is the root node
D E
F G
B is the parent of D and E
C is the sibling of B
D and E are the children of B
D, E, F, G, I are external nodes, or leaves
A, B, C, H are internal nodes
The level of E is 3
The height (depth) of the tree is 4
The degree of node B is 2
The degree of the tree is 3
The ancestors of node I is A, C, H
The descendants of node C is F, G, H, I
2
H
3
I
4
6
Representation of Trees

List Representation
– we can write of Figure 5.2 as a list in which each of
the sub-trees is also a list
• ( A ( B ( E ( K, L ), F ), C ( G ), D ( H ( M ), I, J ) ) )
– the root comes first, followed by a list of sub-trees
7
8
Left child-right sibling representation
9
10
Binary Trees



Binary trees are characterized by the fact that any
node can have at most two branches
Definition (recursive):
– A binary tree is a finite set of nodes that is either empty
or consists of a root and two disjoint binary trees called
the left sub-tree and the right sub-tree
Question
Thus the left sub-tree
and the right sub-tree are
distinguished
Answer
A
A
B

B
Any tree can be transformed into a binary tree
– by left child-right sibling representation
11
The abstract data type of binary tree
12
Skewed and complete binary trees
13
Properties of Binary Trees

Lemma 5.1 [Maximum number of nodes]:
– The maximum number of nodes on level i of a
binary tree is 2i-1, i ≧ 1
– The maximum number of nodes in a binary tree of
depth k is 2k-1, k ≧ 1

Lemma 5.2 [Relation between number of leaf
nodes and degree-2 nodes]:
– For any nonempty binary tree, T, if n0 is the
number of leaf nodes and n2 is the number of
nodes of degree 2, then n0 = n2 + 1

These lemmas allow us to define full and
complete binary trees
14
Full/Complete Binary Tree



A full binary tree of depth k is a binary tree
of death k having 2k-1 nodes, k ≧ 0
A binary tree with n nodes and depth k is
complete iff its nodes correspond to the
nodes numbered from 1 to n in the full
binary tree of depth k
From Lemma 5.1, the height of a complete
binary tree with n nodes is log2(n+1)
15
16
Binary Tree Representations
Using Array

Lemma 5.3: If a complete binary tree
with n nodes is represented
sequentially, then for any node with
index i, 1 ≦ i ≦ n, we have
– parent(i) is at i/2 if i ≠ 1
• if i = 1, i is at the root and has no parent
– left_child(i) is at 2i if 2i ≦ n
• if 2i > n, then i has no left child
– right_child(i) is at 2i+1 if 2i+1 ≦ n
• if 2i +1 > n, then i has no left child
17
[1]
[2]
[3]
[4]
[5]
[6]
[7]
A
B
C
—
D
—
E
Level 2
Level 3
A
Level 1
B
[4]
[1]
C
[2]
[3]
D
H
[5]
[6]
[7]
18
Binary Tree Representations using Array
Drawbacks

Waste spaces
– in the worst case, a skewed tree of depth k
requires 2k-1 spaces
– of these, only k spaces will be occupied

Insertion or deletion of nodes from the
middle of a tree requires the movement
of potentially many nodes to reflect the
change in the level of these nodes
19
20
Binary Tree Representations
Using Link
21
22
Binary Tree Traversals


How to traverse a tree or visit each node in
the tree exactly once?
There are 6 possible combinations of
traversal
– LVR, LRV, VLR, VRL, RVL, RLV

Adopt convention that we traverse left
before right, only 3 traversals remain
– LVR (inorder)
– LRV (postorder)
– VLR (preorder)
V: visiting node
left_child data right_child
L: moving left
R: moving right
23
Binary Tree
Arithmetic Expression

Arithmetic Expression using binary tree
– inorder traversal (infix expression)
• A/B*C*D+E
– preorder traversal (prefix expression)
• +**/ABCDE
– postorder traversal (postfix expression)
• AB/C*D*E+
– level order traversal
• +*E
*D/CAB
Answer

24
25
26
Level-order traversal, which requires a queue to implement
27
Copying binary trees, similar to postorder traversal
28
Testing equality: binary trees are equivalent if they have the
same data and topology
29
Any Questions?
30
What is
The time complexity of iter_inorder()?
31
Analysis of iter_inorder

Non-recursive inorder traversal

Let n be the number of nodes in the tree

Time complexity: O(n)
– every node of the tree is placed on and
removed from the stack exactly once

Space complexity: O(n)
– equal to the depth of the tree which (skewed
tree is the worst case)
32
Heap
33
Heap



A max/min tree is a tree in which the key
value in each node is no smaller (larger)
than the key values in its children
A max/min heap is a complete binary tree
that is also a max/min tree
Basic operations:
– creation of an empty heap
– insertion of a new element into a heap
– deletion of the largest/smallest element from
the heap
34
35
36
Priority Queues





Heaps are frequently used to implement priority
queues
Delete the element with highest (lowest) priority
Insert the element with arbitrary priority
Heaps is the only way to
implement priority queue
An example: Huffman coding
37
38
Deletion from a max heap
39
Any Questions?
40
Can We Use
Array (ordered or unordered), list
(ordered or unordered) to implement
priority queues?
A further
question
What’s
the complexities?
41
42
Binary Search Trees
43
Binary Search Trees

Heap is not suited for applications in which arbitrary
elements are to be deleted from the element list
– deletion of the max/min element
– deletion of an arbitrary element
– search for an arbitrary element

O(log2n)
O(n)
O(n)
Definition of binary search tree:
– every element has a unique key
– the keys in a nonempty left/right sub-tree are
smaller/larger than the key in the root of sub-tree
– the left and right sub-trees are also binary search trees
44
Heap maintains the orders vertically,
while binary search tree maintains them horizontally
45
Search(25)
Search(76)
44
17
88
65
32
28
97
54
29
82
76
80
46
47
O(height)
48
49
Binary Search Tree
Deletion

Three cases should be considered
– leaf  delete
– one child  delete and change the
pointer to this child
– two child  either the smallest element
in the right sub-tree or the largest
element in the left sub-tree
50
51
Height of a Binary Search Tree



The height of a binary search tree with
n elements can become as large as n
It can be shown that when insertions
and deletions are made at random, the
height of the binary search tree is
O(log2n) on the average
Search trees with a worst-case height
of O(log2n) are called balance search
trees
52
Binary Search Trees
Time Complexity

Searching, insertion, deletion
– O(h), where h is the height of the tree

Worst case—skewed binary tree
– O(n), where n is the number of internal
nodes

Prevent worst case
– rebalancing scheme
– AVL, 2-3, and red-black tree
53
Complete Link
In
Out
a symmetric matrix
the tree
Requirement
- complete link algorithm
- teamwork is encouraged
- a report of how the work is split and why
- time/space analyses
- using C would be the best
Bonus
- output n clusters given n
- single/average link
54
Deadline
2010/4/13 23:59
Zip your code, a step-by-step README of
how to execute the code and anything
worthy extra credit. Email to
[email protected].
55
Hierarchical Clustering



Hierarchical clustering takes as input a set of points
Produces a set of nested clusters organized as a
hierarchical tree
Can be visualized as a dendrogram
4
– a tree-like diagram that records the
sequences of merges
1
2
0.4
5
0.35
2
0.3
3
0.25
3
0.2
0.15
0.1
0
6
1
4
0.05

5
3
6
4
1
2
5
56

The method is summarized below:
– place all points into their own clusters
– while there is more than one cluster, do
• merge the closest pair of clusters

The behavior of the algorithm
depends on how “closest pair of
clusters” is defined
57
Complete Link


Distance between two clusters Ci and Cj is the
maximum distance between any object in Ci and any
object in Cj
The distance is defined by the two most dissimilar
objects

– D(Ci , C j )  max a ,b d ab | a  Ci , b  C j
1
2
3
4

5
1 0.00 0.10 0.90 0.35 0.80
2 0.10 0.00 0.30 0.40 0.50
3 0.90 0.30 0.00 0.60 0.70
4 0.35 0.40 0.60 0.00 0.20

5 0.80 0.50 0.70 0.20 0.00
1
2
3 4
5
58